You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "ozankabak (via GitHub)" <gi...@apache.org> on 2023/03/01 00:01:16 UTC

[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5419: refactor: ParquetExec logical expr. => phys. expr.

ozankabak commented on code in PR #5419:
URL: https://github.com/apache/arrow-datafusion/pull/5419#discussion_r1120950514


##########
datafusion/physical-expr/src/utils.rs:
##########
@@ -235,6 +239,80 @@ pub fn ordering_satisfy_concrete<F: FnOnce() -> EquivalenceProperties>(
     }
 }
 
+/// Extract referenced [`Column`]s within a [`PhysicalExpr`].
+///
+/// This works recursively.
+pub fn get_phys_expr_columns(pred: &Arc<dyn PhysicalExpr>) -> HashSet<Column> {
+    let mut rewriter = ColumnCollector::default();

Review Comment:
   Interesting! We had the same need (collecting columns) emerge in SHJ implementation, so we used this more lightweight recursion:
   ```rust
   fn collect_columns_recursive(expr: &Arc<dyn PhysicalExpr>, columns: &mut Vec<Column>) {
       if let Some(column) = expr.as_any().downcast_ref::<Column>() {
           if !columns.iter().any(|c| c.eq(column)) {
               columns.push(column.clone())
           }
       }
       expr.children()
           .iter()
           .for_each(|e| collect_columns_recursive(e, columns))
   }
   
   fn collect_columns(expr: &Arc<dyn PhysicalExpr>) -> Vec<Column> {
       let mut columns = vec![];
       collect_columns_recursive(expr, &mut columns);
       columns
   }
   ```
   We used a `Vec` instead of a `HashSet` due to anticipated small sizes, but the code is essentially the same 🙂
   
   This makes me think that doing a comprehensive code review and collecting/coalescing/documenting utilities such as this may simplify the codebase, and could be a worthy pursuit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org