You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/16 15:03:52 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue #733: EXPLAIN VERBOSE does not include all the different passes nor final physical plan

alamb opened a new issue #733:
URL: https://github.com/apache/arrow-datafusion/issues/733


   **Describe the bug**
   The output of `EXPLAIN VERBOSE` does not include all the different passes nor final physical plan.
   
   **To Reproduce**
   run `EXPLAIN VERBOSE SELECT ...`
   
   **Expected behavior**
   I expect all the optimizer passes to be shown as well as the physical plan. Actually only `projection_push_down` and `simplify_expressions` are shown. This is despite the fact I know (by putting `println` in the code) that the other passes such as `aggregate_statistics` are being run)
   
   **Additional context**
   I was working to add some tests in IOx based on explain plans and I expected to see the results of statistics replacement in the explain plan (aka I expected to see `count(*)` be rewritten to `num_rows` by AggregateStatistics in https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/optimizer/aggregate_statistics.rs#L41
   
   Not only was the optimizer pass not included in the explain verbose, but its results were not reflected in the explain plan
   
   Here is an example of what came out:
   ```
   : EXPLAIN VERBOSE SELECT count(*) from h2o;
   +-----------------------------------------+-----------------------------------------------------------------------------+
   | plan_type                               | plan                                                                        |
   +-----------------------------------------+-----------------------------------------------------------------------------+
   | logical_plan                            | Projection: #COUNT(UInt8(1))                                                |
   |                                         |   Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]]                         |
   |                                         |     TableScan: h2o projection=None                                          |
   | logical_plan after projection_push_down | Projection: #COUNT(UInt8(1))                                                |
   |                                         |   Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]]                         |
   |                                         |     TableScan: h2o projection=Some([0])                                     |
   | logical_plan after simplify_expressions | Projection: #COUNT(UInt8(1))                                                |
   |                                         |   Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]]                         |
   |                                         |     TableScan: h2o projection=Some([0])                                     |
   | physical_plan                           | ProjectionExec: expr=[COUNT(UInt8(1))@0 as COUNT(UInt8(1))]                 |
   |                                         |   HashAggregateExec: mode=Final, gby=[], aggr=[COUNT(UInt8(1))]             |
   |                                         |     HashAggregateExec: mode=Partial, gby=[], aggr=[COUNT(UInt8(1))]         |
   |                                         |       ProjectionExec: expr=[city@0 as city]                                 |
   |                                         |         DeduplicateExec: [city@0 ASC,state@1 ASC,time@2 ASC]                |
   |                                         |           SortExec: [city@0 ASC,state@1 ASC,time@2 ASC]                     |
   |                                         |             IOxReadFilterNode: table_name=h2o, chunks=1 predicate=Predicate |
   +-----------------------------------------+-----------------------------------------------------------------------------+
   ```
   
   Here is what should have happened (note the removal of the actual scan), when I added the call to `optimize_explain` in AggregateStatistics:
   
   ```
   +-----------------------------------------+-------------------------------------------------------------+
   | plan_type                               | plan                                                        |
   +-----------------------------------------+-------------------------------------------------------------+
   | logical_plan                            | Projection: #COUNT(UInt8(1))                                |
   |                                         |   Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]]         |
   |                                         |     TableScan: h2o projection=None                          |
   | logical_plan after aggregate_statistics | Projection: #COUNT(UInt8(1))                                |
   |                                         |   Projection: UInt64(3) AS COUNT(Uint8(1))                  |
   |                                         |     EmptyRelation                                           |
   | logical_plan after projection_push_down | Projection: #COUNT(UInt8(1))                                |
   |                                         |   Projection: UInt64(3) AS COUNT(Uint8(1))                  |
   |                                         |     EmptyRelation                                           |
   | logical_plan after simplify_expressions | Projection: #COUNT(UInt8(1))                                |
   |                                         |   Projection: UInt64(3) AS COUNT(Uint8(1))                  |
   |                                         |     EmptyRelation                                           |
   | physical_plan                           | ProjectionExec: expr=[COUNT(UInt8(1))@0 as COUNT(Uint8(1))] |
   |                                         |   ProjectionExec: expr=[3 as COUNT(Uint8(1))]               |
   |                                         |     EmptyExec: produce_one_row=true                         |
   +-----------------------------------------+-------------------------------------------------------------+
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #733: EXPLAIN VERBOSE does not include all the different passes nor final physical plan

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #733:
URL: https://github.com/apache/arrow-datafusion/issues/733#issuecomment-881522136


   maybe this would be a good time to include "optimized explain plan" as requested by @Dandandan in  https://github.com/apache/arrow-datafusion/issues/221


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #733: EXPLAIN VERBOSE does not include all the different passes nor final physical plan

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #733:
URL: https://github.com/apache/arrow-datafusion/issues/733#issuecomment-881521268


   Looks like @NGA-TRAN  has filed something similar in this https://github.com/apache/arrow-datafusion/issues/499 as well as I did in #497 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao closed issue #733: EXPLAIN VERBOSE does not include all the different passes nor final physical plan

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed issue #733:
URL: https://github.com/apache/arrow-datafusion/issues/733


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org