You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2020/08/13 19:33:00 UTC

[jira] [Created] (ARROW-9733) [Rust][DataFusion] Aggregates COUNT/MIN/MAX don't work on VARCHAR columns

Andrew Lamb created ARROW-9733:
----------------------------------

             Summary: [Rust][DataFusion] Aggregates COUNT/MIN/MAX don't work on VARCHAR columns
                 Key: ARROW-9733
                 URL: https://issues.apache.org/jira/browse/ARROW-9733
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: Andrew Lamb
         Attachments: repro.csv

h2. Reproducer:

Create a table with a string column:

Repro:
{code}
CREATE EXTERNAL TABLE repro(a INT, b VARCHAR)
STORED AS CSV
WITH HEADER ROW
LOCATION 'repro.csv';
{code}


The contents of repro.csv are as follows (also attached):
{code}
a,b
1,One
1,Two
2,One
2,Two
2,Two
{code}


Now, run a query that tries to aggregate that column:
{code}
select a, count(b) from repro group by a;
{code}


*Actual behavior*:
{code}
> select a, count(b) from repro group by a;
ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression")))
{code}

*Expected Behavior*:
The query runs and produces results
{code}
a, count(b)
1,2
2,3
{code}

h2. Discussion

Using Min/Max aggregates on varchar also doesn't work (but should):

{code}

> select a, min(b) from repro group by a;
ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression")))
> select a, max(b) from repro group by a;
ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression")))
{code}


Fascinatingly these formulations work fine:

{code}
> select a, count(a) from repro group by a;
+---+----------+
| a | count(a) |
+---+----------+
| 2 | 3        |
| 1 | 2        |
+---+----------+
2 row in set. Query took 0 seconds.
> select a, count(1) from repro group by a;
+---+-----------------+
| a | count(UInt8(1)) |
+---+-----------------+
| 2 | 3               |
| 1 | 2               |
+---+-----------------+
2 row in set. Query took 0 seconds.
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)