You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Michal Nowakiewicz (Jira)" <ji...@apache.org> on 2021/05/10 18:17:00 UTC
[jira] [Created] (ARROW-12728) [C++][Compute] Aggregates: implement
count distinct
Michal Nowakiewicz created ARROW-12728:
------------------------------------------
Summary: [C++][Compute] Aggregates: implement count distinct
Key: ARROW-12728
URL: https://issues.apache.org/jira/browse/ARROW-12728
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Affects Versions: 4.0.0
Reporter: Michal Nowakiewicz
Fix For: 5.0.0
Implement count distinct aggregate reusing hash table from hash group by inside of it.
This brings support to SQL queries like:
select a, count(distinct b), count(distinct c) from t group by a
For instance to compute count(distinct b), the first group id mapping will give group id based on column a value; then the second group id mapping is done using the key (groupid(a), b) inside count(distinct b) aggregate (similarly for count(distinct c)).
After all input rows are consumed, the final processing step scans the hash tables based on (groupid(a), b) and updates an array of counts indexed by groupid(a).
The resulting array of counts represents the output of count distinct aggregate.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)