You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/23 08:47:05 UTC

[GitHub] [arrow-datafusion] liukun4515 opened a new issue #2067: Support `sum0` as a built-in agg function

liukun4515 opened a new issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   The hive and flink support the `sum0` function.
   
   https://github.com/apache/hive/blob/6f7c55ab9bc4fd7c3d0c2a6ba3095275b17b3d2d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveDruidRules.java#L210
   
   https://nightlies.apache.org/flink/flink-docs-release-1.3/api/java/org/apache/flink/table/functions/aggfunctions/Sum0AggFunction.html
   
   https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/systemfunctions/
   
   The `sum0` is very similar to `sum` except for all null values.
   
   If all values in the column are null, the result of `sum0` is `0`;
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] liukun4515 commented on issue #2067: Support `sum0` as a built-in agg function

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1079807670


   > Another potential option is to use a user defined aggregate for sum0.
   
   Thanks.
   
   I will try these two methods.
   1. replace `sum0` with `coalesce` or `case when`
   2. udaf
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] liukun4515 commented on issue #2067: Support `sum0` as a built-in agg function

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1076964514


   @Dandandan @alamb Thanks for your suggestions.
   Please hold this issue and don't close.
   This feature is useful for me.
   
   I will try to replace `sum0` with your suggestion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #2067: Support `sum0` as a built-in agg function

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1076759264


   Hi @liukun4515  I think you can get the same effect as `sum0` using a `CASE` statement, as shown below. Given the very minor difference between `sum` and `sum0` it seems like it is not a great idea to add it as a built in to DataFusion from my perspective. 
   
   ```sql
   CASE 
     WHEN sum(x) IS NULL 
       THEN 0 
       ELSE sum(x) 
     END as sum0
   ```
   
   A more full featured example:
   
   ```sql
   DataFusion CLI v7.0.0
   ❯ create table foo as select * from (values (1), (null), (null)) as sq;
   0 rows in set. Query took 0.044 seconds.
   
   ❯ select * from foo;
   +---------+
   | column1 |
   +---------+
   | 1       |
   |         |
   |         |
   +---------+
   3 rows in set. Query took 0.002 seconds.
   
   -- Results when there are only nulls
   ❯ select case when sum(column1) IS NULL THEN 0 ELSE sum(column1) END as sum0 from foo where column1 IS NULL;
   +------+
   | sum0 |
   +------+
   | 0    |
   +------+
   1 row in set. Query took 0.005 seconds.
   
   -- Results when there are nulls and non-null
   ❯ select case when sum(column1) IS NULL THEN 0 ELSE sum(column1) END as sum0 from foo;
   +------+
   | sum0 |
   +------+
   | 1    |
   +------+
   1 row in set. Query took 0.005 seconds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] liukun4515 commented on issue #2067: Support `sum0` as a built-in agg function

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1076099552


   @alamb  @houqp  PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #2067: Support `sum0` as a built-in agg function

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1076780150


   Even more simple/straightforward: `coalesce(sum(x), 0)` after `coalesce` is supported:
   (https://github.com/apache/arrow-datafusion/pull/1969)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #2067: Support `sum0` as a built-in agg function

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1077505505


   Another potential option is to use a user defined aggregate for sum0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] liukun4515 commented on issue #2067: Support `sum0` as a built-in agg function

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1076968836


   > Even more simple/straightforward: `coalesce(sum(x), 0)` after `coalesce` is supported: (#1969)
   
   I do some experiments on the spark using the `coalesce`
   
   ```
   spark-sql> desc t2;
   c1                      int
   ```
   
   spark plan:
   
   ```
   spark-sql> explain select coalesce(sum(c1),0) from t2;
   == Physical Plan ==
   AdaptiveSparkPlan isFinalPlan=false
   +- HashAggregate(keys=[], functions=[sum(c1#127)])
      +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#252]
         +- HashAggregate(keys=[], functions=[partial_sum(c1#127)])
            +- Scan hive default.t2 [c1#127], HiveTableRelation [`default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [c1#127], Partition Cols: []]
   ```
   In my user case, the function of `sum0` will be used in the ballista size.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb edited a comment on issue #2067: Support `sum0` as a built-in agg function

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on issue #2067:
URL: https://github.com/apache/arrow-datafusion/issues/2067#issuecomment-1077505505


   Is the point that `sum0` is easier to optimize than `coalesce(sum(c1), 0)`?
   
   Another potential option is to use a user defined aggregate for sum0. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org