You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/09 21:01:13 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #4157: Collapse statistics in normal explain plan

alamb opened a new pull request, #4157:
URL: https://github.com/apache/arrow-datafusion/pull/4157

   # Which issue does this PR close?
   
   RE https://github.com/apache/arrow-datafusion/issues/4144
   
   # Rationale for this change
   
   It is almost impossible to get any useful information out of `explain analyze` for a parquet exec when it has more than 1-2 files. See https://github.com/apache/arrow-datafusion/issues/4144 for examples
   
   # What changes are included in this PR?
   
   Collapses all the statisics from parquet exec with the same name when showing in explain plan
   
   Before:
   ```
   |                   |           ParquetExec: limit=None, partitions=[Users/alamb/Software/arrow-datafusion2/parquet-testing/data/alltypes_plain.parquet], predicate=timestamp_col_max@0 > 1233446400000000000, projection=[id, float_col, timestamp_col], metrics=[output_rows=8, elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, bytes_scanned{filename=Users/alamb/Software/arrow-datafusion2/parquet-testing/data/alltypes_plain.parquet}=259, pushdown_rows_filtered{filename=Users/alamb/Software/arrow-datafusion2/parquet-testing/data/alltypes_plain.parquet}=0, row_groups_pruned{filename=Users/alamb/Software/arrow-datafusion2/parquet-testing/data/alltypes_plain.parquet}=0, predicate_evaluation_errors{filename=Users/alamb/Software/arrow-datafusion2/parquet-testing/data/alltypes_plain.parquet}=0, page_index_rows_filtered{filename=Users/alamb/Software/arrow-datafusion2/parquet-testing/data/alltypes_plain.parquet}=0, num_predicate_creation_errors=0, page_index_eval_time{filename=Use
 rs/alamb/Software/arrow-datafusion2/parquet-testing/data/alltypes_plain.parquet}=2ns, time_elapsed_opening=558.733µs, time_elapsed_processing=1.103953ms, pushdown_eval_time{filename=Users/alamb/Software/arrow-datafusion2/parquet-testing/data/alltypes_plain.parquet}=2ns, time_elapsed_scanning=656.927µs] |
   ```
   
   After (still not great but easier easier):
   ```
   |                   |           ParquetExec: limit=None, partitions=[Users/alamb/Software/arrow-datafusion2/parquet-testing/data/alltypes_plain.parquet], predicate=timestamp_col_max@0 > 1233446400000000000, projection=[id, float_col, timestamp_col], metrics=[output_rows=8, elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, predicate_evaluation_errors=0, pushdown_rows_filtered=0, bytes_scanned=259, num_predicate_creation_errors=0, row_groups_pruned=0, page_index_rows_filtered=0, time_elapsed_opening=588.861µs, time_elapsed_scanning=644.732µs, time_elapsed_processing=1.115933ms, page_index_eval_time=2ns, pushdown_eval_time=2ns] |
   ```
   
   # Are these changes tested?
   Yes
   # Are there any user-facing changes?
   Yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4157: Collapse statistics in normal explain plan

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4157:
URL: https://github.com/apache/arrow-datafusion/pull/4157#discussion_r1018410829


##########
datafusion/core/src/physical_plan/metrics/mod.rs:
##########
@@ -140,7 +140,7 @@ impl Metric {
     }
 
     /// Add a new label to this metric
-    pub fn with(mut self, label: Label) -> Self {
+    pub fn with_label(mut self, label: Label) -> Self {

Review Comment:
   Drive by cleanup to make the setter conform to standard `with_` naming



##########
datafusion/core/src/physical_plan/metrics/mod.rs:
##########
@@ -259,27 +259,23 @@ impl MetricsSet {
     }
 
     /// Returns returns a new derived `MetricsSet` where all metrics
-    /// that had the same name and partition=`Some(..)` have been
+    /// that had the same name have been
     /// aggregated together. The resulting `MetricsSet` has all
     /// metrics with `Partition=None`
-    pub fn aggregate_by_partition(&self) -> Self {
+    pub fn aggregate_by_name(&self) -> Self {

Review Comment:
   This is the core change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ursabot commented on pull request #4157: Collapse statistics in normal explain plan

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #4157:
URL: https://github.com/apache/arrow-datafusion/pull/4157#issuecomment-1317359349

   Benchmark runs are scheduled for baseline = d52234fb9232326da6e69d9cd9dfaa5293808eba and contender = 75ef1945f9aa832f4b25817c14b2517c7b33f073. 75ef1945f9aa832f4b25817c14b2517c7b33f073 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/0b24c7347e6447f29fadab03351c4e3d...5273dd3ef4eb4221ac3a43214991d78b/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/4a36c8e4cee544d0a04cb2184383d506...c9bde72738b74c1cb2f1b6e6711a5c12/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/0faa26f8eaa34a7595eb69c23c2198f1...c68927e0faac4beb99a65639d7b41f4e/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/92ccd700f3be410ebc85bea7fffcc55d...f6acb1f8ba2e48a59ed2391cb353ab89/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove closed pull request #4157: Collapse statistics in normal explain plan

Posted by GitBox <gi...@apache.org>.
andygrove closed pull request #4157: Collapse statistics in normal explain plan
URL: https://github.com/apache/arrow-datafusion/pull/4157


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4157: Collapse statistics in normal explain plan

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4157:
URL: https://github.com/apache/arrow-datafusion/pull/4157#issuecomment-1317496720

   > @alamb Thanks for doing this, this metric system is so subtle.❤️
   
   Yeah, I didn't really know what I was doing when I originally implemented it. It is definitely due for a revamp if someone has the time. I suspect it could be made much simpler


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org