You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/07 13:58:18 UTC

[GitHub] [arrow] alamb commented on pull request #8839: ARROW-10732: [Rust] [DataFusion] Integrate DFSchema as a step towards supporting qualified column names

alamb commented on pull request #8839:
URL: https://github.com/apache/arrow/pull/8839#issuecomment-739934509


   > As you can see, the data_type and nullable use the schema from the plan whereas the evaluate method uses the schema from the record batch, which is a little inconsistent. They should probably all use the same schema.
   
   I agree -- I recommend using the schema from the plan for consistency.
   
   > This IMO leaves us with 2., which is what I would try: change the physical planner to alias/rewrite column names with the qualifier when the physical plan is created. This will cause the resulting RecordBatch's schema to have columns named t1.a and t2.a, thereby guaranteeing the invariant that the output schema of the physical execution matches the schema of the logical plan.
   
   
   I agree with this recommendation -- I would recommend when moving from logical --> physical plan, that we always use the fully qualified name of the field, which would avoid ambiguity. If we don't like `t1.foo` being sprinkled around in plans that only have one table or where the column names aren't ambiguous, we could implement a (logical plan) optimizer pass to remove unneeded qualifiers. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org