You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jorge (Jira)" <ji...@apache.org> on 2020/09/12 09:45:00 UTC

[jira] [Assigned] (ARROW-9809) [Rust] [DataFusion] logical schema = physical schema is not true

     [ https://issues.apache.org/jira/browse/ARROW-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jorge reassigned ARROW-9809:
----------------------------

    Assignee: Jorge

> [Rust] [DataFusion] logical schema = physical schema is not true
> ----------------------------------------------------------------
>
>                 Key: ARROW-9809
>                 URL: https://issues.apache.org/jira/browse/ARROW-9809
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust, Rust - DataFusion
>            Reporter: Jorge
>            Assignee: Jorge
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In tests/sql.rs, we test that the physical and the optimized schema must match. However, this is not necessarily true for all our queries. An example:
> {code:java}
> #[test]
> fn csv_query_sum_cast() {
>     let mut ctx = ExecutionContext::new();
>     register_aggregate_csv_by_sql(&mut ctx);
>     // c8 = i32; c9 = i64
>     let sql = "SELECT c8 + c9 FROM aggregate_test_100";
>     // check that the physical and logical schemas are equal
>     execute(&mut ctx, sql);
> }
> {code}
> The physical expression (and schema) of this operation, after optimization, is {{CAST(c8 as Int64) Plus c9}} (this test fails).
> AFAIK, the invariant of the optimizer is that the output types and nullability are the same.
> Also, note that the reason the optimized logical schema equals the logical schema is that our type coercer does not change the output names of the schema, even though it re-writes logical expressions. I.e. after the optimization, `.to_field()` of an expression may no longer match the field name nor type in the Plan's schema. IMO this is currently by (implicit?) design, as we do not want our logical schema's column names to change during optimizations, or all column references may point to non-existent columns. This is something that brought up on the mailing list about polymorphism.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)