You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2022/01/01 11:49:00 UTC
[jira] [Commented] (ARROW-12159) [Rust][DataFusion] Support grouping on expressions
[ https://issues.apache.org/jira/browse/ARROW-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467410#comment-17467410 ]
Andrew Lamb commented on ARROW-12159:
-------------------------------------
Yes this should have been in arrow-datafusion (or maybe it was even fixed while datafusion lived in the main arrow repo). And it turns out that this feature is already implemented, so closing this ticket.
{code}
❯ select x + 1, sum(y) from foo group by x + 1;
+---------------------+------------+
| foo.x Plus Int64(1) | SUM(foo.y) |
+---------------------+------------+
| 2 | 2 |
+---------------------+------------+
{code}
> [Rust][DataFusion] Support grouping on expressions
> --------------------------------------------------
>
> Key: ARROW-12159
> URL: https://issues.apache.org/jira/browse/ARROW-12159
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Rust
> Reporter: Andrew Lamb
> Priority: Major
>
> Usecase:
> I want to group based on time windows (as defined by the `date_trunc` function).
> For example, given the table:
> {code}
> +------+-------------------+---------------------+-------------+------------------+-------------------+--------------+-----------+------------+---------------+-------------+--------------------+--------------------+
> | cpu | host | time | usage_guest | usage_guest_nice | usage_idle | usage_iowait | usage_irq | usage_nice | usage_softirq | usage_steal | usage_system | usage_user |
> +------+-------------------+---------------------+-------------+------------------+-------------------+--------------+-----------+------------+---------------+-------------+--------------------+--------------------+
> | cpu0 | MacBook-Pro.local | 1617130130000000000 | 0 | 0 | 65.30408773649165 | 0 | 0 | 0 | 0 | 0 | 18.444666002000673 | 16.251246261217506 |
> | cpu1 | MacBook-Pro.local | 1617130130000000000 | 0 | 0 | 84.43113772402216 | 0 | 0 | 0 | 0 | 0 | 3.193612774446795 | 12.37524950097282 |
> | cpu2 | MacBook-Pro.local | 1617130130000000000 | 0 | 0 | 65.96806387199344 | 0 | 0 | 0 | 0 | 0 | 15.469061876247794 | 18.56287425146831 |
> | cpu3 | MacBook-Pro.local | 1617130130000000000 | 0 | 0 | 84.0478564307993 | 0 | 0 | 0 | 0 | 0 | 3.0907278165770684 | 12.861415752863932 |
> | cpu4 | MacBook-Pro.local | 1617130130000000000 | 0 | 0 | 63.21036889281897 | 0 | 0 | 0 | 0 | 0 | 13.758723828377473 | 23.030907278223218 |
> | cpu5 | MacBook-Pro.local | 1617130130000000000 | 0 | 0 | 83.94815553242313 | 0 | 0 | 0 | 0 | 0 | 2.991026919231221 | 13.0608175473346 |
> | cpu6 | MacBook-Pro.local | 1617130130000000000 | 0 | 0 | 70.85828343276965 | 0 | 0 | 0 | 0 | 0 | 12.87425149699077 | 16.26746506987651 |
> | cpu7 | MacBook-Pro.local | 1617130130000000000 | 0 | 0 | 83.9321357287122 | 0 | 0 | 0 | 0 | 0 | 3.093812375243205 | 12.974051896176206 |
> | cpu8 | MacBook-Pro.local | 1617130130000000000 | 0 | 0 | 74.80079681313936 | 0 | 0 | 0 | 0 | 0 | 10.756972111708253 | 14.442231075949556 |
> | cpu9 | MacBook-Pro.local | 1617130130000000000 | 0 | 0 | 83.84845463618315 | 0 | 0 | 0 | 0 | 0 | 3.0907278165434624 | 13.060817547316466 |
> +------+-------------------+---------------------+-------------+------------------+-------------------+--------------+-----------+------------+---------------+-------------+--------------------+--------------------+
> {code}
> I want to be able to find the min and max usage time grouped by minute
> {code}
> select
> date_trunc('minute', cast (time as timestamp)),
> min(usage_user),
> max(usage_user)
> from
> cpu
> group by
> date_trunc('minute', cast (time as timestamp)), min(usage_user)"
> {code}
> Or alternately
> {code}
> select
> date_trunc('minute', cast (time as timestamp)),
> min(usage_user),
> max(usage_user)
> from
> cpu
> group by
> 1
> {code}
> {code}Instead as of now I get a planning error:
> Error preparing query Error during planning: Projection references non-aggregate values
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)