You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/23 08:47:05 UTC
[GitHub] [arrow-datafusion] liukun4515 opened a new issue #2067: Support `sum0` as a built-in agg function
liukun4515 opened a new issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
The hive and flink support the `sum0` function.
https://github.com/apache/hive/blob/6f7c55ab9bc4fd7c3d0c2a6ba3095275b17b3d2d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveDruidRules.java#L210
https://nightlies.apache.org/flink/flink-docs-release-1.3/api/java/org/apache/flink/table/functions/aggfunctions/Sum0AggFunction.html
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/systemfunctions/
The `sum0` is very similar to `sum` except for all null values.
If all values in the column are null, the result of `sum0` is `0`;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] liukun4515 commented on issue #2067: Support `sum0` as a built-in agg function
Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1079807670
> Another potential option is to use a user defined aggregate for sum0.
Thanks.
I will try these two methods.
1. replace `sum0` with `coalesce` or `case when`
2. udaf
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] liukun4515 commented on issue #2067: Support `sum0` as a built-in agg function
Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1076964514
@Dandandan @alamb Thanks for your suggestions.
Please hold this issue and don't close.
This feature is useful for me.
I will try to replace `sum0` with your suggestion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #2067: Support `sum0` as a built-in agg function
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1076759264
Hi @liukun4515 I think you can get the same effect as `sum0` using a `CASE` statement, as shown below. Given the very minor difference between `sum` and `sum0` it seems like it is not a great idea to add it as a built in to DataFusion from my perspective.
```sql
CASE
WHEN sum(x) IS NULL
THEN 0
ELSE sum(x)
END as sum0
```
A more full featured example:
```sql
DataFusion CLI v7.0.0
❯ create table foo as select * from (values (1), (null), (null)) as sq;
0 rows in set. Query took 0.044 seconds.
❯ select * from foo;
+---------+
| column1 |
+---------+
| 1 |
| |
| |
+---------+
3 rows in set. Query took 0.002 seconds.
-- Results when there are only nulls
❯ select case when sum(column1) IS NULL THEN 0 ELSE sum(column1) END as sum0 from foo where column1 IS NULL;
+------+
| sum0 |
+------+
| 0 |
+------+
1 row in set. Query took 0.005 seconds.
-- Results when there are nulls and non-null
❯ select case when sum(column1) IS NULL THEN 0 ELSE sum(column1) END as sum0 from foo;
+------+
| sum0 |
+------+
| 1 |
+------+
1 row in set. Query took 0.005 seconds.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] liukun4515 commented on issue #2067: Support `sum0` as a built-in agg function
Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1076099552
@alamb @houqp PTAL
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] Dandandan commented on issue #2067: Support `sum0` as a built-in agg function
Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1076780150
Even more simple/straightforward: `coalesce(sum(x), 0)` after `coalesce` is supported:
(https://github.com/apache/arrow-datafusion/pull/1969)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #2067: Support `sum0` as a built-in agg function
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1077505505
Another potential option is to use a user defined aggregate for sum0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] liukun4515 commented on issue #2067: Support `sum0` as a built-in agg function
Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1076968836
> Even more simple/straightforward: `coalesce(sum(x), 0)` after `coalesce` is supported: (#1969)
I do some experiments on the spark using the `coalesce`
```
spark-sql> desc t2;
c1 int
```
spark plan:
```
spark-sql> explain select coalesce(sum(c1),0) from t2;
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[], functions=[sum(c1#127)])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#252]
+- HashAggregate(keys=[], functions=[partial_sum(c1#127)])
+- Scan hive default.t2 [c1#127], HiveTableRelation [`default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [c1#127], Partition Cols: []]
```
In my user case, the function of `sum0` will be used in the ballista size.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb edited a comment on issue #2067: Support `sum0` as a built-in agg function
Posted by GitBox <gi...@apache.org>.
alamb edited a comment on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1077505505
Is the point that `sum0` is easier to optimize than `coalesce(sum(c1), 0)`?
Another potential option is to use a user defined aggregate for sum0.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org