You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/13 17:01:17 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request #337: Implement readable explain plans for physical plans

alamb opened a new pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337


   # Which issue does this PR close?
   https://github.com/apache/arrow-datafusion/issues/333
   
    # Rationale for this change
   EXPLAIN output for physical plans is currently close to useless (in my opinion). 
   
   # What changes are included in this PR?
   * Visitor pattern to traverse `ExecutionPlans`
   * New `displayable` function for displaying `ExecutionPlans` reasonably
   * Documentation and test
   
   Note I will hope to use the same basic infrastructure to implement graphviz plans https://github.com/apache/arrow-datafusion/issues/219
   
   # Example new format 
   ```
   > explain verbose select * from foo where a < 4;
   +-----------------------------------------+------------------------------------------------------------------------+
   | plan_type                               | plan                                                                   |
   +-----------------------------------------+------------------------------------------------------------------------+
   | logical_plan                            | Projection: #a, #b, #c                                                 |
   |                                         |   Filter: #a Lt Int64(4)                                               |
   |                                         |     TableScan: foo projection=None                                     |
   | logical_plan after projection_push_down | Projection: #a, #b, #c                                                 |
   |                                         |   Filter: #a Lt Int64(4)                                               |
   |                                         |     TableScan: foo projection=Some([0, 1, 2])                          |
   | logical_plan after projection_push_down | Projection: #a, #b, #c                                                 |
   |                                         |   Filter: #a Lt Int64(4)                                               |
   |                                         |     TableScan: foo projection=Some([0, 1, 2])                          |
   | physical_plan                           | ProjectionExec: expr=[a, b, c]                                         |
   |                                         |  FilterExec: CAST(a AS Int64) < 4                                      |
   |                                         |   CsvExec: source=Path(/tmp/foo.csv: [/tmp/foo.csv]), has_header=false |
   +-----------------------------------------+------------------------------------------------------------------------+
   ```
   
   # Are there any user-facing changes?
   Yes: output format for `EXPLAIN VERBOSE` has changed
   
   
   
   New Output:
   
   #API changes
   All changes are backwards compatible
   
   
   # Example old format 
   ```
   > explain verbose select * from foo where a < 4;
   
   +-----------------------------------------+-------------------------------------------------+
   | plan_type                               | plan                                            |
   +-----------------------------------------+-------------------------------------------------+
   | logical_plan                            | Projection: #a, #b, #c                          |
   |                                         |   Filter: #a Lt Int64(4)                        |
   |                                         |     TableScan: foo projection=None              |
   | logical_plan after projection_push_down | Projection: #a, #b, #c                          |
   |                                         |   Filter: #a Lt Int64(4)                        |
   |                                         |     TableScan: foo projection=Some([0, 1, 2])   |
   | logical_plan after projection_push_down | Projection: #a, #b, #c                          |
   |                                         |   Filter: #a Lt Int64(4)                        |
   |                                         |     TableScan: foo projection=Some([0, 1, 2])   |
   | physical_plan                           | ProjectionExec {                                |
   |                                         |     expr: [                                     |
   |                                         |         (                                       |
   |                                         |             Column {                            |
   |                                         |                 name: "a",                      |
   |                                         |             },                                  |
   |                                         |             "a",                                |
   |                                         |         ),                                      |
   |                                         |         (                                       |
   |                                         |             Column {                            |
   |                                         |                 name: "b",                      |
   |                                         |             },                                  |
   |                                         |             "b",                                |
   |                                         |         ),                                      |
   |                                         |         (                                       |
   |                                         |             Column {                            |
   |                                         |                 name: "c",                      |
   |                                         |             },                                  |
   |                                         |             "c",                                |
   |                                         |         ),                                      |
   |                                         |     ],                                          |
   |                                         |     schema: Schema {                            |
   |                                         |         fields: [                               |
   |                                         |             Field {                             |
   |                                         |                 name: "a",                      |
   |                                         |                 data_type: Int32,               |
   |                                         |                 nullable: false,                |
   |                                         |                 dict_id: 0,                     |
   |                                         |                 dict_is_ordered: false,         |
   |                                         |                 metadata: None,                 |
   |                                         |             },                                  |
   |                                         |             Field {                             |
   |                                         |                 name: "b",                      |
   |                                         |                 data_type: Int32,               |
   |                                         |                 nullable: false,                |
   |                                         |                 dict_id: 0,                     |
   |                                         |                 dict_is_ordered: false,         |
   |                                         |                 metadata: None,                 |
   |                                         |             },                                  |
   |                                         |             Field {                             |
   |                                         |                 name: "c",                      |
   |                                         |                 data_type: Int32,               |
   |                                         |                 nullable: false,                |
   |                                         |                 dict_id: 0,                     |
   |                                         |                 dict_is_ordered: false,         |
   |                                         |                 metadata: None,                 |
   |                                         |             },                                  |
   |                                         |         ],                                      |
   |                                         |         metadata: {},                           |
   |                                         |     },                                          |
   |                                         |     input: FilterExec {                         |
   |                                         |         predicate: BinaryExpr {                 |
   |                                         |             left: TryCastExpr {                 |
   |                                         |                 expr: Column {                  |
   |                                         |                     name: "a",                  |
   |                                         |                 },                              |
   |                                         |                 cast_type: Int64,               |
   |                                         |             },                                  |
   |                                         |             op: Lt,                             |
   |                                         |             right: Literal {                    |
   |                                         |                 value: Int64(4),                |
   |                                         |             },                                  |
   |                                         |         },                                      |
   |                                         |         input: CsvExec {                        |
   |                                         |             source: PartitionedFiles {          |
   |                                         |                 path: "/tmp/foo.csv",           |
   |                                         |                 filenames: [                    |
   |                                         |                     "/tmp/foo.csv",             |
   |                                         |                 ],                              |
   |                                         |             },                                  |
   |                                         |             schema: Schema {                    |
   |                                         |                 fields: [                       |
   |                                         |                     Field {                     |
   |                                         |                         name: "a",              |
   |                                         |                         data_type: Int32,       |
   |                                         |                         nullable: false,        |
   |                                         |                         dict_id: 0,             |
   |                                         |                         dict_is_ordered: false, |
   |                                         |                         metadata: None,         |
   |                                         |                     },                          |
   |                                         |                     Field {                     |
   |                                         |                         name: "b",              |
   |                                         |                         data_type: Int32,       |
   |                                         |                         nullable: false,        |
   |                                         |                         dict_id: 0,             |
   |                                         |                         dict_is_ordered: false, |
   |                                         |                         metadata: None,         |
   |                                         |                     },                          |
   |                                         |                     Field {                     |
   |                                         |                         name: "c",              |
   |                                         |                         data_type: Int32,       |
   |                                         |                         nullable: false,        |
   |                                         |                         dict_id: 0,             |
   |                                         |                         dict_is_ordered: false, |
   |                                         |                         metadata: None,         |
   |                                         |                     },                          |
   |                                         |                 ],                              |
   |                                         |                 metadata: {},                   |
   |                                         |             },                                  |
   |                                         |             has_header: false,                  |
   |                                         |             delimiter: Some(                    |
   |                                         |                 44,                             |
   |                                         |             ),                                  |
   |                                         |             file_extension: ".csv",             |
   |                                         |             projection: Some(                   |
   |                                         |                 [                               |
   |                                         |                     0,                          |
   |                                         |                     1,                          |
   |                                         |                     2,                          |
   |                                         |                 ],                              |
   |                                         |             ),                                  |
   |                                         |             projected_schema: Schema {          |
   |                                         |                 fields: [                       |
   |                                         |                     Field {                     |
   |                                         |                         name: "a",              |
   |                                         |                         data_type: Int32,       |
   |                                         |                         nullable: false,        |
   |                                         |                         dict_id: 0,             |
   |                                         |                         dict_is_ordered: false, |
   |                                         |                         metadata: None,         |
   |                                         |                     },                          |
   |                                         |                     Field {                     |
   |                                         |                         name: "b",              |
   |                                         |                         data_type: Int32,       |
   |                                         |                         nullable: false,        |
   |                                         |                         dict_id: 0,             |
   |                                         |                         dict_is_ordered: false, |
   |                                         |                         metadata: None,         |
   |                                         |                     },                          |
   |                                         |                     Field {                     |
   |                                         |                         name: "c",              |
   |                                         |                         data_type: Int32,       |
   |                                         |                         nullable: false,        |
   |                                         |                         dict_id: 0,             |
   |                                         |                         dict_is_ordered: false, |
   |                                         |                         metadata: None,         |
   |                                         |                     },                          |
   |                                         |                 ],                              |
   |                                         |                 metadata: {},                   |
   |                                         |             },                                  |
   |                                         |             batch_size: 8192,                   |
   |                                         |             limit: None,                        |
   |                                         |         },                                      |
   |                                         |     },                                          |
   |                                         | }                                               |
   +-----------------------------------------+-------------------------------------------------+
   ```
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337#discussion_r631961359



##########
File path: datafusion/src/logical_plan/display.rs
##########
@@ -58,8 +52,7 @@ impl<'a, 'b> PlanVisitor for IndentVisitor<'a, 'b> {
         if self.indent > 0 {
             writeln!(self.f)?;
         }
-        self.write_indent()?;
-
+        write!(self.f, "{:indent$}", "", indent = self.indent * 2)?;

Review comment:
       this is a better way to make indents that I found while googling around

##########
File path: datafusion/src/physical_plan/display.rs
##########
@@ -0,0 +1,73 @@
+//! Implementation of physical plan display. See
+//! [`crate::physical_plan::displayable`] for examples of how to
+//! format
+
+use std::fmt;
+
+use super::{accept, ExecutionPlan, ExecutionPlanVisitor};
+
+/// Options for controlling how each [`ExecutionPlan`] should format itself
+#[derive(Debug, Clone, Copy)]
+pub enum DisplayFormatType {
+    /// Default, compact format. Example: `FilterExec: c12 < 10.0`
+    Default,

Review comment:
       I envision adding more types (e.g. Graphviz) as needs evolve

##########
File path: datafusion/src/physical_plan/mod.rs
##########
@@ -152,6 +162,133 @@ pub trait ExecutionPlan: Debug + Send + Sync {
     fn metrics(&self) -> HashMap<String, SQLMetric> {
         HashMap::new()
     }
+
+    /// Format this `ExecutionPlan` to `f` in the specified type.
+    ///
+    /// Should not include a newline
+    ///
+    /// Note this function prints a placeholder by default to preserve
+    /// backwards compatibility.
+    fn fmt_as(&self, _t: DisplayFormatType, f: &mut fmt::Formatter) -> fmt::Result {
+        write!(f, "ExecutionPlan(PlaceHolder)")
+    }
+}
+
+/// Return a [wrapper](DisplayableExecutionPlan) around an

Review comment:
       This is the main proposed interface: `displayable` which returns a `struct` which then has several ways to display it. It would be ideal if I could add this to `ExecutionPlan` directly itself, but since it is a trait this was the best I could come up with (along with a bunch of documentation)

##########
File path: datafusion/tests/sql.rs
##########
@@ -2885,3 +2885,45 @@ async fn test_cast_expressions_error() -> Result<()> {
 
     Ok(())
 }
+
+#[tokio::test]
+async fn test_physical_plan_display_indent() {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx).unwrap();

Review comment:
       I thought one end to end test would be reasonable to make sure the output was ok and that it didn't regress, but also wouldn't take too much effort to maintain




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] NGA-TRAN commented on a change in pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
NGA-TRAN commented on a change in pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337#discussion_r632084193



##########
File path: datafusion/tests/sql.rs
##########
@@ -2885,3 +2885,45 @@ async fn test_cast_expressions_error() -> Result<()> {
 
     Ok(())
 }
+
+#[tokio::test]
+async fn test_physical_plan_display_indent() {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx).unwrap();
+    let sql = "SELECT c1, MAX(c12), MIN(c12) as the_min \
+         FROM aggregate_test_100 \
+         WHERE c12 < 10 \
+         GROUP BY c1 \
+         ORDER BY the_min DESC \
+         LIMIT 10";
+    let plan = ctx.create_logical_plan(&sql).unwrap();
+    let plan = ctx.optimize(&plan).unwrap();
+
+    let physical_plan = ctx.create_physical_plan(&plan).unwrap();
+    let expected = vec![
+        "GlobalLimitExec: limit=10",
+        "  SortExec: [the_min DESC]",
+        "    ProjectionExec: expr=[c1, MAX(c12), MIN(c12) as the_min]",
+        "      HashAggregateExec: mode=Final, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "        MergeExec",
+        "          HashAggregateExec: mode=Partial, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "            CoalesceBatchesExec: target_batch_size=4096",
+        "              FilterExec: c12 < CAST(10 AS Float64)",
+        "                RepartitionExec: partitioning=RoundRobinBatch(16)",

Review comment:
       Make sense. Thanks @Dandandan 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337#discussion_r632087097



##########
File path: datafusion/tests/sql.rs
##########
@@ -2885,3 +2885,45 @@ async fn test_cast_expressions_error() -> Result<()> {
 
     Ok(())
 }
+
+#[tokio::test]
+async fn test_physical_plan_display_indent() {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx).unwrap();
+    let sql = "SELECT c1, MAX(c12), MIN(c12) as the_min \
+         FROM aggregate_test_100 \
+         WHERE c12 < 10 \
+         GROUP BY c1 \
+         ORDER BY the_min DESC \
+         LIMIT 10";
+    let plan = ctx.create_logical_plan(&sql).unwrap();
+    let plan = ctx.optimize(&plan).unwrap();
+
+    let physical_plan = ctx.create_physical_plan(&plan).unwrap();
+    let expected = vec![
+        "GlobalLimitExec: limit=10",
+        "  SortExec: [the_min DESC]",
+        "    ProjectionExec: expr=[c1, MAX(c12), MIN(c12) as the_min]",
+        "      HashAggregateExec: mode=Final, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "        MergeExec",
+        "          HashAggregateExec: mode=Partial, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "            CoalesceBatchesExec: target_batch_size=4096",
+        "              FilterExec: c12 < CAST(10 AS Float64)",
+        "                RepartitionExec: partitioning=RoundRobinBatch(16)",

Review comment:
       We should set the concurrency level if we want to statically check the plan




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337#issuecomment-841394561


   Test fixed in https://github.com/apache/arrow-datafusion/pull/337/commits/03b776ffb1ff329c156c6ef1debddc510b55bebb


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337#issuecomment-841406715


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/337?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#337](https://codecov.io/gh/apache/arrow-datafusion/pull/337?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (03b776f) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/b44238d05094ab0fa0171769ce8b890a0045e1e1?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b44238d) will **decrease** coverage by `0.47%`.
   > The diff coverage is `49.19%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/337/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/337?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #337      +/-   ##
   ==========================================
   - Coverage   76.07%   75.59%   -0.48%     
   ==========================================
     Files         142      143       +1     
     Lines       23788    23695      -93     
   ==========================================
   - Hits        18097    17913     -184     
   - Misses       5691     5782      +91     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/337?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/337/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL3BsYW4ucnM=) | `81.19% <ø> (ø)` | |
   | [datafusion/src/physical\_plan/cross\_join.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/337/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9jcm9zc19qb2luLnJz) | `73.88% <0.00%> (-2.28%)` | :arrow_down: |
   | [...tafusion/src/physical\_plan/distinct\_expressions.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/337/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9kaXN0aW5jdF9leHByZXNzaW9ucy5ycw==) | `89.65% <0.00%> (-0.70%)` | :arrow_down: |
   | [datafusion/src/physical\_plan/empty.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/337/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9lbXB0eS5ycw==) | `83.82% <0.00%> (-5.73%)` | :arrow_down: |
   | [datafusion/src/physical\_plan/explain.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/337/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9leHBsYWluLnJz) | `51.35% <0.00%> (-6.23%)` | :arrow_down: |
   | [...atafusion/src/physical\_plan/expressions/average.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/337/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9leHByZXNzaW9ucy9hdmVyYWdlLnJz) | `81.73% <0.00%> (-1.45%)` | :arrow_down: |
   | [datafusion/src/physical\_plan/expressions/count.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/337/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9leHByZXNzaW9ucy9jb3VudC5ycw==) | `84.04% <0.00%> (-1.83%)` | :arrow_down: |
   | [datafusion/src/physical\_plan/expressions/sum.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/337/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9leHByZXNzaW9ucy9zdW0ucnM=) | `76.92% <0.00%> (-1.00%)` | :arrow_down: |
   | [datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/337/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX2pvaW4ucnM=) | `86.40% <0.00%> (-1.09%)` | :arrow_down: |
   | [datafusion/src/physical\_plan/memory.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/337/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9tZW1vcnkucnM=) | `67.27% <0.00%> (-13.17%)` | :arrow_down: |
   | ... and [53 more](https://codecov.io/gh/apache/arrow-datafusion/pull/337/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/337?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/337?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [b44238d...03b776f](https://codecov.io/gh/apache/arrow-datafusion/pull/337?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] NGA-TRAN commented on a change in pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
NGA-TRAN commented on a change in pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337#discussion_r632078747



##########
File path: datafusion/tests/sql.rs
##########
@@ -2885,3 +2885,45 @@ async fn test_cast_expressions_error() -> Result<()> {
 
     Ok(())
 }
+
+#[tokio::test]
+async fn test_physical_plan_display_indent() {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx).unwrap();
+    let sql = "SELECT c1, MAX(c12), MIN(c12) as the_min \
+         FROM aggregate_test_100 \
+         WHERE c12 < 10 \
+         GROUP BY c1 \
+         ORDER BY the_min DESC \
+         LIMIT 10";
+    let plan = ctx.create_logical_plan(&sql).unwrap();
+    let plan = ctx.optimize(&plan).unwrap();
+
+    let physical_plan = ctx.create_physical_plan(&plan).unwrap();
+    let expected = vec![
+        "GlobalLimitExec: limit=10",
+        "  SortExec: [the_min DESC]",
+        "    ProjectionExec: expr=[c1, MAX(c12), MIN(c12) as the_min]",
+        "      HashAggregateExec: mode=Final, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "        MergeExec",
+        "          HashAggregateExec: mode=Partial, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "            CoalesceBatchesExec: target_batch_size=4096",
+        "              FilterExec: c12 < CAST(10 AS Float64)",
+        "                RepartitionExec: partitioning=RoundRobinBatch(16)",

Review comment:
       What is the RepartitionExec does? to split is to smaller batches to send to multi-threads?

##########
File path: datafusion/src/physical_plan/mod.rs
##########
@@ -152,6 +162,133 @@ pub trait ExecutionPlan: Debug + Send + Sync {
     fn metrics(&self) -> HashMap<String, SQLMetric> {
         HashMap::new()
     }
+
+    /// Format this `ExecutionPlan` to `f` in the specified type.
+    ///
+    /// Should not include a newline
+    ///
+    /// Note this function prints a placeholder by default to preserve
+    /// backwards compatibility.
+    fn fmt_as(&self, _t: DisplayFormatType, f: &mut fmt::Formatter) -> fmt::Result {
+        write!(f, "ExecutionPlan(PlaceHolder)")
+    }
+}
+
+/// Return a [wrapper](DisplayableExecutionPlan) around an
+/// [`ExecutionPlan`] which can be displayed in various easier to
+/// understand ways.
+///
+/// ```
+/// use datafusion::prelude::*;
+/// use datafusion::physical_plan::displayable;
+///
+/// // register the a table
+/// let mut ctx = ExecutionContext::new();
+/// ctx.register_csv("example", "tests/example.csv", CsvReadOptions::new()).unwrap();
+///
+/// // create a plan to run a SQL query
+/// let plan = ctx
+///    .create_logical_plan("SELECT a FROM example WHERE a < 5")
+///    .unwrap();
+/// let plan = ctx.optimize(&plan).unwrap();
+/// let physical_plan = ctx.create_physical_plan(&plan).unwrap();
+///
+/// // Format using display string
+/// let displayable_plan = displayable(physical_plan.as_ref());
+/// let plan_string = format!("{}", displayable_plan.indent());
+///
+/// assert_eq!("ProjectionExec: expr=[a]\
+///            \n  CoalesceBatchesExec: target_batch_size=4096\
+///            \n    FilterExec: a < 5\
+///            \n      RepartitionExec: partitioning=RoundRobinBatch(16)\
+///            \n        CsvExec: source=Path(tests/example.csv: [tests/example.csv]), has_header=true",
+///             plan_string.trim());
+/// ```
+///
+pub fn displayable(plan: &dyn ExecutionPlan) -> DisplayableExecutionPlan<'_> {
+    DisplayableExecutionPlan::new(plan)
+}
+
+/// Visit all children of this plan, according to the order defined on `ExecutionPlanVisitor`.
+// Note that this would be really nice if it were a method on
+// ExecutionPlan, but it can not be because it takes a generic
+// parameter and `ExecutionPlan` is a trait
+pub fn accept<V: ExecutionPlanVisitor>(
+    plan: &dyn ExecutionPlan,
+    visitor: &mut V,
+) -> std::result::Result<(), V::Error> {
+    visitor.pre_visit(plan)?;
+    for child in plan.children() {
+        visit_execution_plan(child.as_ref(), visitor)?;
+    }
+    visitor.post_visit(plan)?;
+    Ok(())
+}
+
+/// Trait that implements the [Visitor
+/// pattern](https://en.wikipedia.org/wiki/Visitor_pattern) for a
+/// depth first walk of `ExecutionPlan` nodes. `pre_visit` is called
+/// before any children are visited, and then `post_visit` is called
+/// after all children have been visited.
+////
+/// To use, define a struct that implements this trait and then invoke
+/// ['accept'].
+///
+/// For example, for an execution plan that looks like:
+///
+/// ```text
+/// ProjectionExec: #id
+///    FilterExec: state = CO
+///       CsvExec:
+/// ```
+///
+/// The sequence of visit operations would be:
+/// ```text
+/// visitor.pre_visit(ProjectionExec)
+/// visitor.pre_visit(FilterExec)
+/// visitor.pre_visit(CsvExec)
+/// visitor.post_visit(CsvExec)
+/// visitor.post_visit(FilterExec)
+/// visitor.post_visit(ProjectionExec)
+/// ```

Review comment:
       Nice

##########
File path: datafusion/src/logical_plan/plan.rs
##########
@@ -356,13 +356,15 @@ pub enum Partitioning {
 /// after all children have been visited.
 ////
 /// To use, define a struct that implements this trait and then invoke
-/// "LogicalPlan::accept".
+/// [`LogicalPlan::accept`].

Review comment:
       Question: What this change does? better looking in the doc?

##########
File path: datafusion/tests/sql.rs
##########
@@ -2885,3 +2885,45 @@ async fn test_cast_expressions_error() -> Result<()> {
 
     Ok(())
 }
+
+#[tokio::test]
+async fn test_physical_plan_display_indent() {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx).unwrap();
+    let sql = "SELECT c1, MAX(c12), MIN(c12) as the_min \
+         FROM aggregate_test_100 \
+         WHERE c12 < 10 \
+         GROUP BY c1 \
+         ORDER BY the_min DESC \
+         LIMIT 10";
+    let plan = ctx.create_logical_plan(&sql).unwrap();
+    let plan = ctx.optimize(&plan).unwrap();
+
+    let physical_plan = ctx.create_physical_plan(&plan).unwrap();
+    let expected = vec![
+        "GlobalLimitExec: limit=10",
+        "  SortExec: [the_min DESC]",
+        "    ProjectionExec: expr=[c1, MAX(c12), MIN(c12) as the_min]",
+        "      HashAggregateExec: mode=Final, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "        MergeExec",
+        "          HashAggregateExec: mode=Partial, gby=[c1], aggr=[MAX(c12), MIN(c12)]",

Review comment:
       Great to see the partial aggregate displayed here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337#discussion_r632120108



##########
File path: datafusion/tests/sql.rs
##########
@@ -2885,3 +2885,45 @@ async fn test_cast_expressions_error() -> Result<()> {
 
     Ok(())
 }
+
+#[tokio::test]
+async fn test_physical_plan_display_indent() {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx).unwrap();
+    let sql = "SELECT c1, MAX(c12), MIN(c12) as the_min \
+         FROM aggregate_test_100 \
+         WHERE c12 < 10 \
+         GROUP BY c1 \
+         ORDER BY the_min DESC \
+         LIMIT 10";
+    let plan = ctx.create_logical_plan(&sql).unwrap();
+    let plan = ctx.optimize(&plan).unwrap();
+
+    let physical_plan = ctx.create_physical_plan(&plan).unwrap();
+    let expected = vec![
+        "GlobalLimitExec: limit=10",
+        "  SortExec: [the_min DESC]",
+        "    ProjectionExec: expr=[c1, MAX(c12), MIN(c12) as the_min]",
+        "      HashAggregateExec: mode=Final, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "        MergeExec",
+        "          HashAggregateExec: mode=Partial, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "            CoalesceBatchesExec: target_batch_size=4096",
+        "              FilterExec: c12 < CAST(10 AS Float64)",
+        "                RepartitionExec: partitioning=RoundRobinBatch(16)",

Review comment:
       good call -- I will do so. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on a change in pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337#discussion_r632701908



##########
File path: datafusion/src/logical_plan/plan.rs
##########
@@ -356,13 +356,15 @@ pub enum Partitioning {
 /// after all children have been visited.
 ////
 /// To use, define a struct that implements this trait and then invoke
-/// "LogicalPlan::accept".
+/// [`LogicalPlan::accept`].

Review comment:
       It makes it an hyperlink in API docs :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337#discussion_r632083020



##########
File path: datafusion/tests/sql.rs
##########
@@ -2885,3 +2885,45 @@ async fn test_cast_expressions_error() -> Result<()> {
 
     Ok(())
 }
+
+#[tokio::test]
+async fn test_physical_plan_display_indent() {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx).unwrap();
+    let sql = "SELECT c1, MAX(c12), MIN(c12) as the_min \
+         FROM aggregate_test_100 \
+         WHERE c12 < 10 \
+         GROUP BY c1 \
+         ORDER BY the_min DESC \
+         LIMIT 10";
+    let plan = ctx.create_logical_plan(&sql).unwrap();
+    let plan = ctx.optimize(&plan).unwrap();
+
+    let physical_plan = ctx.create_physical_plan(&plan).unwrap();
+    let expected = vec![
+        "GlobalLimitExec: limit=10",
+        "  SortExec: [the_min DESC]",
+        "    ProjectionExec: expr=[c1, MAX(c12), MIN(c12) as the_min]",
+        "      HashAggregateExec: mode=Final, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "        MergeExec",
+        "          HashAggregateExec: mode=Partial, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "            CoalesceBatchesExec: target_batch_size=4096",
+        "              FilterExec: c12 < CAST(10 AS Float64)",
+        "                RepartitionExec: partitioning=RoundRobinBatch(16)",

Review comment:
       Round robin repartition will move the batches as is one by 1 different partitions, in this case based on "round robin" so partion 1,2,3 etc. which are executed in different threads.
   There is also hash repartition which sends the values based on hashed keys to different threads.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337#discussion_r632087097



##########
File path: datafusion/tests/sql.rs
##########
@@ -2885,3 +2885,45 @@ async fn test_cast_expressions_error() -> Result<()> {
 
     Ok(())
 }
+
+#[tokio::test]
+async fn test_physical_plan_display_indent() {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx).unwrap();
+    let sql = "SELECT c1, MAX(c12), MIN(c12) as the_min \
+         FROM aggregate_test_100 \
+         WHERE c12 < 10 \
+         GROUP BY c1 \
+         ORDER BY the_min DESC \
+         LIMIT 10";
+    let plan = ctx.create_logical_plan(&sql).unwrap();
+    let plan = ctx.optimize(&plan).unwrap();
+
+    let physical_plan = ctx.create_physical_plan(&plan).unwrap();
+    let expected = vec![
+        "GlobalLimitExec: limit=10",
+        "  SortExec: [the_min DESC]",
+        "    ProjectionExec: expr=[c1, MAX(c12), MIN(c12) as the_min]",
+        "      HashAggregateExec: mode=Final, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "        MergeExec",
+        "          HashAggregateExec: mode=Partial, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "            CoalesceBatchesExec: target_batch_size=4096",
+        "              FilterExec: c12 < CAST(10 AS Float64)",
+        "                RepartitionExec: partitioning=RoundRobinBatch(16)",

Review comment:
       We should statically set the concurrency level in the execution config if we want to check the plan like this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb edited a comment on pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337#issuecomment-841394561


   Test fixed in https://github.com/apache/arrow-datafusion/pull/337/commits/03b776ffb1ff329c156c6ef1debddc510b55bebb (I hope 🤞 )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337#issuecomment-840818860


   Looking much better @alamb !!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337#discussion_r632083020



##########
File path: datafusion/tests/sql.rs
##########
@@ -2885,3 +2885,45 @@ async fn test_cast_expressions_error() -> Result<()> {
 
     Ok(())
 }
+
+#[tokio::test]
+async fn test_physical_plan_display_indent() {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx).unwrap();
+    let sql = "SELECT c1, MAX(c12), MIN(c12) as the_min \
+         FROM aggregate_test_100 \
+         WHERE c12 < 10 \
+         GROUP BY c1 \
+         ORDER BY the_min DESC \
+         LIMIT 10";
+    let plan = ctx.create_logical_plan(&sql).unwrap();
+    let plan = ctx.optimize(&plan).unwrap();
+
+    let physical_plan = ctx.create_physical_plan(&plan).unwrap();
+    let expected = vec![
+        "GlobalLimitExec: limit=10",
+        "  SortExec: [the_min DESC]",
+        "    ProjectionExec: expr=[c1, MAX(c12), MIN(c12) as the_min]",
+        "      HashAggregateExec: mode=Final, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "        MergeExec",
+        "          HashAggregateExec: mode=Partial, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "            CoalesceBatchesExec: target_batch_size=4096",
+        "              FilterExec: c12 < CAST(10 AS Float64)",
+        "                RepartitionExec: partitioning=RoundRobinBatch(16)",

Review comment:
       Round robin repartition will move the batches as is one by one to different partitions, in this case based on "round robin" so partion 1,2,3 etc. which are executed in different threads.
   There is also hash repartition which sends the values based on hashed keys to different threads.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #337: Implement readable explain plans for physical plans

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #337:
URL: https://github.com/apache/arrow-datafusion/pull/337#discussion_r632087097



##########
File path: datafusion/tests/sql.rs
##########
@@ -2885,3 +2885,45 @@ async fn test_cast_expressions_error() -> Result<()> {
 
     Ok(())
 }
+
+#[tokio::test]
+async fn test_physical_plan_display_indent() {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_csv(&mut ctx).unwrap();
+    let sql = "SELECT c1, MAX(c12), MIN(c12) as the_min \
+         FROM aggregate_test_100 \
+         WHERE c12 < 10 \
+         GROUP BY c1 \
+         ORDER BY the_min DESC \
+         LIMIT 10";
+    let plan = ctx.create_logical_plan(&sql).unwrap();
+    let plan = ctx.optimize(&plan).unwrap();
+
+    let physical_plan = ctx.create_physical_plan(&plan).unwrap();
+    let expected = vec![
+        "GlobalLimitExec: limit=10",
+        "  SortExec: [the_min DESC]",
+        "    ProjectionExec: expr=[c1, MAX(c12), MIN(c12) as the_min]",
+        "      HashAggregateExec: mode=Final, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "        MergeExec",
+        "          HashAggregateExec: mode=Partial, gby=[c1], aggr=[MAX(c12), MIN(c12)]",
+        "            CoalesceBatchesExec: target_batch_size=4096",
+        "              FilterExec: c12 < CAST(10 AS Float64)",
+        "                RepartitionExec: partitioning=RoundRobinBatch(16)",

Review comment:
       We should statically set the concurrency level in the execution if we want to check the plan like this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org