You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/11 20:42:17 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2463: Fix projection pushdown produces incorrect results when column names are reused

alamb commented on code in PR #2463:
URL: https://github.com/apache/arrow-datafusion/pull/2463#discussion_r870737808


##########
datafusion/core/src/optimizer/projection_push_down.rs:
##########
@@ -172,16 +172,7 @@ fn optimize_plan(
                 _execution_props,
             )?;
 
-            let new_required_columns_optimized = new_input
-                .schema()
-                .fields()
-                .iter()
-                .map(|f| f.qualified_column())
-                .collect::<HashSet<Column>>();
-
-            if new_fields.is_empty()
-                || (has_projection && &new_required_columns_optimized == required_columns)

Review Comment:
   I wonder if the issue is that this code is just checking column names to detect a reorder. The real check, as exposed in your reproducer, is that the the column names are the same *AND* that the expressions are only column references or constants... In the case of expressions (like in your reproducer) equal names is not sufficient
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org