You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "waynexia (via GitHub)" <gi...@apache.org> on 2023/03/08 06:55:12 UTC

[GitHub] [arrow-datafusion] waynexia opened a new issue, #5513: `all_schema()` will skip ExtensionPlan's own schema and fetches schemas from children plans

waynexia opened a new issue, #5513:
URL: https://github.com/apache/arrow-datafusion/issues/5513

   **Describe the bug**
   <!--
   A clear and concise description of what the bug is.
   -->
   
   I found a regression when debugging https://github.com/GreptimeTeam/greptimedb/issues/1136. Since https://github.com/apache/arrow-datafusion/pull/5236 the `all_schema()` won't use the schema of `ExtensionPlan`, but its children's schemas.
   
   **To Reproduce**
   <!--
   Steps to reproduce the behavior:
   -->
   Not important
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   
   It should use the schema from `ExtensionPlan`
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->
   
   As @alamb and @jackwener discussed in https://github.com/apache/arrow-datafusion/pull/5236#pullrequestreview-1293793794, `all_schema()` seems unnecessary. It looks like all the usages of `all_schema()` are for naming a column (the collected schemas are passed to `normalize_with_schemas()`).  But, only a few plans can have multiple inputs (join, union, and maybe extension). I'm wondering if we can take special care of them inside `normalize_with_schemas()` and remove `all_schema()` (`using_columns` in `normalize_with_schemas()` looks like something is doing this)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org