You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/06 03:17:47 UTC

[GitHub] [arrow] jorgecarleitao commented on pull request #8839: ARROW-10732: [Rust] [DataFusion] Integrate DFSchema as a step towards supporting qualified column names

jorgecarleitao commented on pull request #8839:
URL: https://github.com/apache/arrow/pull/8839#issuecomment-739448512


   Hey @andygrove . Thanks a lot for this!
   
   I would benefit from understanding the use-case for `DFSchema` at the physical plan. Note that this is primarily for my own understanding, as I am only familiar with qualifier names in SQL to disambiguate columns in expressions concerning more than one table - not in the representation of a statement at the logical and physical plan. Maybe you could give an example of where `arrow::Schema` is not sufficient at the physical level?
   
   My current understanding is that, without qualifiers, we can't write things like `(table1.a + 1) >= (table2.b - 1)`.
   
   What I am trying to understand is when do we need such an expression at the physical level. Typically, these plans require some form of join and are mapped to `filter(join(a, b))`, in which case I do not see how a qualifier is used: before the join there are two input nodes that are joined on a key (i.e. always an equality relationship between columns); after the join, there is a single node, and thus qualifiers are not needed.
   
   One use case case I see for this is when the join is itself over an expression, e.g. `JOIN ON (table1.a + 1) == (table2.b - 1)`. However, in this case, at the physical level, this can always be mapped to `join(projection())`. I.e. it seems to me that it is more of a convenience at building a logical statement than a necessity for executing such a statement.
   
   If the goal is that we can add the qualifier to the column name after the join, to desambiguate `table1.a` from `table2.a`, wouldn't it be easier to do that at the logical plan?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org