You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Kyle McCarthy (Jira)" <ji...@apache.org> on 2019/12/30 22:41:00 UTC

[jira] [Created] (ARROW-7480) [Rust] [DataFusion] Query fails/incorrect when aggregated + grouped columns don't match the selected columns

Kyle McCarthy created ARROW-7480:
------------------------------------

             Summary: [Rust] [DataFusion] Query fails/incorrect when aggregated + grouped columns don't match the selected columns
                 Key: ARROW-7480
                 URL: https://issues.apache.org/jira/browse/ARROW-7480
             Project: Apache Arrow
          Issue Type: Bug
          Components: Rust, Rust - DataFusion
            Reporter: Kyle McCarthy


**There are two scenarios that cause problems but are related to the queries with aggregate expressions and the SQL planner.

*Scenario 1*

Columns are grouped by but not selected.

 
{noformat}
ArrowError(InvalidArgumentError("number of columns must match number of fields in schema")){noformat}
 

You can reproduce with the aggregate_test_100 dataset with the query:

 
{code:java}
SELECT c1, MIN(c12) FROM aggregate_test_100 GROUP BY c1, c13{code}
 

*Senario 2*

 

Columns are selected, but not grouped or part of an aggregate function. This query actually will run, but the wrong schema is produced.

Query 

 
{code:java}
SELECT c1, c13, MIN(c12) FROM aggregate_test_100 GROUP BY c1{code}
Schema generated:

 
{code:java}
Schema {
    fields: [
        Field {
            name: "c0",
            data_type: Utf8,
            nullable: true,
        },
        Field {
            name: "c1",
            data_type: Float64,
            nullable: true,
        },
        Field {
            name: "c1",
            data_type: Float64,
            nullable: true,
        },
    ],
    metadata: {},
} {code}
----
Schema 2 is questionable since some DBMS will run the query (ex MySQL) but others will require that all the columns must be in the GROUP BY to be used in an aggregate function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)