You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/23 15:14:58 UTC

[GitHub] [arrow-datafusion] andygrove opened a new issue #396: DataFusion benchmarks should show executed plan with metrics after query completes

andygrove opened a new issue #396:
URL: https://github.com/apache/arrow-datafusion/issues/396


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   I would like to be able to see metrics for a query plan after it is executed in the benchmarks. This is a convenient way to see where performance bottlenecks are in a query.
   
   The following example shows metrics for `SortExec` but this is the only operator that we have implemented metrics for so far.
   
   ```
   SortExec: [revenue DESC] metrics=[sortTime=56686,outputRows=5]
     MergeExec metrics=[]
       ProjectionExec: expr=[n_name, SUM(l_extendedprice Multiply Int64(1) Minus l_discount) as revenue] metrics=[]
   ```
   
   **Describe the solution you'd like**
   
   To produce the above example, I simply hacked the existing `IndentVisitor` as shown below, but this is not a good solution. It wasn't immediately clear to me how I could implement this to fit with the current design. Should there be a `MetricsVisitor` that we can somehow combine with the `IndentVisitor`? I also looked at adding a new variant to the `DisplayFormatType` variant but that required code changes in specific operators, so that didn't seem ideal.
   
   ```rust
   fn pre_visit(
       &mut self,
       plan: &dyn ExecutionPlan,
   ) -> std::result::Result<bool, Self::Error> {
       write!(self.f, "{:indent$}", "", indent = self.indent * 2)?;
       plan.fmt_as(self.t, self.f)?;
       // BEGIN METRICS HACK
       let metrics_str = plan.metrics().iter()
           .map(|(k, v)| format!("{}={}", k, v.value()))
           .collect::<Vec<String>>();
       write!(self.f, " metrics=[{}]", metrics_str.join(","))?;
       // END METRICS HACK
       writeln!(self.f)?;
       self.indent += 1;
       Ok(true)
   }
   ```
   
   **Describe alternatives you've considered**
   None
   
   **Additional context**
   None
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on issue #396: DataFusion benchmarks should show executed plan with metrics after query completes

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #396:
URL: https://github.com/apache/arrow-datafusion/issues/396#issuecomment-846579179


   @alamb Do you have any design advice for this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #396: DataFusion benchmarks should show executed plan with metrics after query completes

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #396:
URL: https://github.com/apache/arrow-datafusion/issues/396#issuecomment-875789469


   @NGA-TRAN  -- 👍  -- see also #679 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] NGA-TRAN commented on issue #396: DataFusion benchmarks should show executed plan with metrics after query completes

Posted by GitBox <gi...@apache.org>.
NGA-TRAN commented on issue #396:
URL: https://github.com/apache/arrow-datafusion/issues/396#issuecomment-875625584


   Regarding metrics, I think the following items will be useful:
     1. Sort algorithm (quick sort, ...)
     2. If possible, how many partitions (input streams) get sorted.
     3. Depends on sort algorithm, we may be able to know what kind of input encoding (e.g. RLE) and number of distinct values. I think these metrics will also help us to evaluate the effectiveness of the sort algorithm we choose.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #396: DataFusion benchmarks should show executed plan with metrics after query completes

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #396:
URL: https://github.com/apache/arrow-datafusion/issues/396#issuecomment-848770549


   @NGA-TRAN  has also been working on similar functionality for IOx so may also have some feedback here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #396: DataFusion benchmarks should show executed plan with metrics after query completes

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #396:
URL: https://github.com/apache/arrow-datafusion/issues/396#issuecomment-847212797


   I think adding a flag on the `IndentVisitor` like `show_metrics` is a reasonable idea, to be honest. We could also make a special purpose visitor too, but that might be overkill


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan closed issue #396: DataFusion benchmarks should show executed plan with metrics after query completes

Posted by GitBox <gi...@apache.org>.
Dandandan closed issue #396:
URL: https://github.com/apache/arrow-datafusion/issues/396


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org