You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2020/10/09 10:52:00 UTC

[jira] [Assigned] (ARROW-9733) [Rust][DataFusion] Aggregates COUNT/MIN/MAX don't work on VARCHAR columns

     [ https://issues.apache.org/jira/browse/ARROW-9733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jorge Leitão reassigned ARROW-9733:
-----------------------------------

    Assignee: Jorge Leitão

> [Rust][DataFusion] Aggregates COUNT/MIN/MAX don't work on VARCHAR columns
> -------------------------------------------------------------------------
>
>                 Key: ARROW-9733
>                 URL: https://issues.apache.org/jira/browse/ARROW-9733
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust, Rust - DataFusion
>            Reporter: Andrew Lamb
>            Assignee: Jorge Leitão
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0
>
>         Attachments: repro.csv
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> h2. Reproducer:
> Create a table with a string column:
> Repro:
> {code}
> CREATE EXTERNAL TABLE repro(a INT, b VARCHAR)
> STORED AS CSV
> WITH HEADER ROW
> LOCATION 'repro.csv';
> {code}
> The contents of repro.csv are as follows (also attached):
> {code}
> a,b
> 1,One
> 1,Two
> 2,One
> 2,Two
> 2,Two
> {code}
> Now, run a query that tries to aggregate that column:
> {code}
> select a, count(b) from repro group by a;
> {code}
> *Actual behavior*:
> {code}
> > select a, count(b) from repro group by a;
> ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression")))
> {code}
> *Expected Behavior*:
> The query runs and produces results
> {code}
> a, count(b)
> 1,2
> 2,3
> {code}
> h2. Discussion
> Using Min/Max aggregates on varchar also doesn't work (but should):
> {code}
> > select a, min(b) from repro group by a;
> ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression")))
> > select a, max(b) from repro group by a;
> ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression")))
> {code}
> Fascinatingly these formulations work fine:
> {code}
> > select a, count(a) from repro group by a;
> +---+----------+
> | a | count(a) |
> +---+----------+
> | 2 | 3        |
> | 1 | 2        |
> +---+----------+
> 2 row in set. Query took 0 seconds.
> > select a, count(1) from repro group by a;
> +---+-----------------+
> | a | count(UInt8(1)) |
> +---+-----------------+
> | 2 | 3               |
> | 1 | 2               |
> +---+-----------------+
> 2 row in set. Query took 0 seconds.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)