You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/26 13:27:38 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue #155: Join Statement: Schema contains duplicate unqualified field name

alamb opened a new issue #155:
URL: https://github.com/apache/arrow-datafusion/issues/155


   *Note*: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-11432
   
   https://github.com/apache/arrow/issues/9307


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #155: Join Statement: Schema contains duplicate unqualified field name

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #155:
URL: https://github.com/apache/arrow-datafusion/issues/155#issuecomment-826834922


   Comment from GANG LIAO(gangliao) @ 2021-01-29T20:09:31.033+0000:
   <pre>join statement cannot distinguish two columns with the same name from two tables. [https://github.com/apache/arrow/issues/9307]
   
   [~jorgecarleitao]  </pre>
   
   Comment from R J(TurnOfACard) @ 2021-02-18T11:19:34.301+0000:
   <pre>When building a schema using `datafusion::logical_plan::plan::LogicalPlan::schema()` the returned schema lacks a table qualifier. I think when building the TableScan, it allocates a table_name, but the returned`datafusion::logical_plan::dfschema::DFField`'s have no reference to the provided table name. Attempting to join the schemas (in `datafusion::logical_plan::dfschema::DFSchema::new`) then results in a list of fields with conflicting names as they lack qualifiers.
   
   I am new to DataFusion but would be happy to add some unit tests and try to look at this on the weekend.</pre>
   
   Comment from R J(TurnOfACard) @ 2021-02-20T22:59:14.507+0000:
   <pre>I'm not sure there is an easy fix without making breaking changes to the public API. When building a join schema, it checks if the join set is valid (physical_plan::hash_utils::check_join_set_is_valid), which has a parent public API call (physical_plan::hash_utils::check_join_is_valid). This join is unaware of the registered name (CSV or parquet) as it is performed with arrow schemas rather than DataFusion schemas.
   
    
   
   EDIT:
   
   It could be my lack of knowledge of the DataFusion codebase, but it appears it would need a lot of changes.</pre>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp closed issue #155: Join Statement: Schema contains duplicate unqualified field name

Posted by GitBox <gi...@apache.org>.
houqp closed issue #155:
URL: https://github.com/apache/arrow-datafusion/issues/155


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org