You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/14 16:10:16 UTC

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1448: Fix bug in projection: "column types must match schema types, expected XXX but found YYY"

alamb commented on a change in pull request #1448:
URL: https://github.com/apache/arrow-datafusion/pull/1448#discussion_r768809913



##########
File path: datafusion/src/physical_plan/projection.rs
##########
@@ -63,13 +63,22 @@ impl ProjectionExec {
 
         let fields: Result<Vec<Field>> = expr
             .iter()
-            .map(|(e, name)| match input_schema.field_with_name(name) {
-                Ok(f) => Ok(f.clone()),

Review comment:
       I totally missed this in my review -- basically the projection's output is not the same for its input even if the field name matches (due to aliases). The actual output type needs to be calculated from the expr for all cases.

##########
File path: datafusion/src/physical_plan/projection.rs
##########
@@ -63,13 +63,22 @@ impl ProjectionExec {
 
         let fields: Result<Vec<Field>> = expr
             .iter()
-            .map(|(e, name)| match input_schema.field_with_name(name) {
-                Ok(f) => Ok(f.clone()),
-                Err(_) => {
-                    let dt = e.data_type(&input_schema)?;
-                    let nullable = e.nullable(&input_schema)?;
-                    Ok(Field::new(name, dt, nullable))
-                }
+            .map(|(e, name)| {

Review comment:
       I am not convinced it is a great idea to copy field metadata from the input to the output based on the field name alone... As that would mean metadata on a field named `a` from the input could end up on the output field `a` even if the output was not related at all to `a`. 
   
   Thus I ~will make~ made this code only copy field level metadata for a direct input column reference




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org