You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "crepererum (via GitHub)" <gi...@apache.org> on 2023/03/01 09:22:48 UTC

[GitHub] [arrow-datafusion] crepererum commented on a diff in pull request #5419: refactor: ParquetExec logical expr. => phys. expr.

crepererum commented on code in PR #5419:
URL: https://github.com/apache/arrow-datafusion/pull/5419#discussion_r1121397443


##########
datafusion/physical-expr/src/utils.rs:
##########
@@ -235,6 +239,80 @@ pub fn ordering_satisfy_concrete<F: FnOnce() -> EquivalenceProperties>(
     }
 }
 
+/// Extract referenced [`Column`]s within a [`PhysicalExpr`].
+///
+/// This works recursively.
+pub fn get_phys_expr_columns(pred: &Arc<dyn PhysicalExpr>) -> HashSet<Column> {
+    let mut rewriter = ColumnCollector::default();

Review Comment:
   @ozankabak the issue w/ `Vec` is that you have a `O(n^2)` complexity in the number of used columns. In InfluxDB IOx we sometimes have schemas w/ over 200 columns and I'm somewhat worried that such a simple oversight quickly turns into a performance bug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org