You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/03 08:40:26 UTC

[GitHub] [arrow-datafusion] houqp commented on pull request #55: Support qualified columns in queries

houqp commented on pull request #55:
URL: https://github.com/apache/arrow-datafusion/pull/55#issuecomment-831115015


   Alright, I wrote some docs over the weekend to help align expectations:
   
   * Document for output schema field name semantics with examples: https://docs.google.com/document/d/1uviWavwEGD3qxwMk2AGkOgp6ENrvKGiMWQhHNbqPwhg/edit?usp=sharing
   * Proposed change to @jorgecarleitao 's invariant doc: https://docs.google.com/document/d/158gbfDp8pcakfriT2l7dHChwJB43_RV7lcWfxEC73ng/edit?usp=sharing
   * Updated invariant doc with proposed changes applied: https://docs.google.com/document/d/1dbK-3eaTHlzZcHzpTk1h-LA3b7dcxsVBcoZeVKYIPwI/edit?usp=sharing
   
   Please feel free to comment and make suggestions in the docs. One conclusion that came out of my research is everyone is naming output fields in a slightly different way and PostgreSQL wins the laziest developer award.
   
   @andygrove with regards to schemaless I am thinking it might be better to handle them with a new set of schemaless physical nodes. All of our current physical nodes requires knowing data type at planning time, which is not applicable for schemaless data sources. The switch to use index as unique identifier is unavoidable in this case because column names are not guaranteed to be unique anymore once relation is introduced. For example, two joined tables could both introduce columns with the same names. I will do more research and take a look at Drill's source code to see if there are better ways to handle schemaless data sources.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org