You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2020/08/13 19:33:00 UTC
[jira] [Created] (ARROW-9733) [Rust][DataFusion] Aggregates
COUNT/MIN/MAX don't work on VARCHAR columns
Andrew Lamb created ARROW-9733:
----------------------------------
Summary: [Rust][DataFusion] Aggregates COUNT/MIN/MAX don't work on VARCHAR columns
Key: ARROW-9733
URL: https://issues.apache.org/jira/browse/ARROW-9733
Project: Apache Arrow
Issue Type: Bug
Reporter: Andrew Lamb
Attachments: repro.csv
h2. Reproducer:
Create a table with a string column:
Repro:
{code}
CREATE EXTERNAL TABLE repro(a INT, b VARCHAR)
STORED AS CSV
WITH HEADER ROW
LOCATION 'repro.csv';
{code}
The contents of repro.csv are as follows (also attached):
{code}
a,b
1,One
1,Two
2,One
2,Two
2,Two
{code}
Now, run a query that tries to aggregate that column:
{code}
select a, count(b) from repro group by a;
{code}
*Actual behavior*:
{code}
> select a, count(b) from repro group by a;
ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression")))
{code}
*Expected Behavior*:
The query runs and produces results
{code}
a, count(b)
1,2
2,3
{code}
h2. Discussion
Using Min/Max aggregates on varchar also doesn't work (but should):
{code}
> select a, min(b) from repro group by a;
ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression")))
> select a, max(b) from repro group by a;
ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression")))
{code}
Fascinatingly these formulations work fine:
{code}
> select a, count(a) from repro group by a;
+---+----------+
| a | count(a) |
+---+----------+
| 2 | 3 |
| 1 | 2 |
+---+----------+
2 row in set. Query took 0 seconds.
> select a, count(1) from repro group by a;
+---+-----------------+
| a | count(UInt8(1)) |
+---+-----------------+
| 2 | 3 |
| 1 | 2 |
+---+-----------------+
2 row in set. Query took 0 seconds.
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)