You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/05 14:15:41 UTC

[GitHub] [arrow-datafusion] Jimexist opened a new issue #272: Support for aggregate function with filter

Jimexist opened a new issue #272:
URL: https://github.com/apache/arrow-datafusion/issues/272


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for this feature, in addition to  the *what*)
   
   A new challenge, to support aggregate function with filter. See [Postgres manual](https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES)
   
   **Describe the solution you'd like**
   A clear and concise description of what you want to happen.
   
   Exactly like [in the manual](https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES)
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features you've considered.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #272: Support for aggregate function with filter

Posted by GitBox <gi...@apache.org>.

Dandandan edited a comment on issue #272:
URL: https://github.com/apache/arrow-datafusion/issues/272#issuecomment-833826346


   @alamb yes, that's a great one - isn't that only feasible with `SUM`/`MAX`/`MIN`? I think primary use case would be `COUNT` or `AVG`
   
   Edit: yeah or simulating count using `SUM`, as I am seeing in your example. You could do something in the same way for `AVG`  of course if you are feeling creative :D


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] Dandandan commented on issue #272: Support for aggregate function with filter

Posted by GitBox <gi...@apache.org>.

Dandandan commented on issue #272:
URL: https://github.com/apache/arrow-datafusion/issues/272#issuecomment-832871245


   That would be a valuable feature @Jimexist 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #272: Support for aggregate function with filter

Posted by GitBox <gi...@apache.org>.

Dandandan edited a comment on issue #272:
URL: https://github.com/apache/arrow-datafusion/issues/272#issuecomment-833826346


   @alamb yes, that's a great one - isn't that only feasible with `SUM`/`MAX`/`MIN`? I think primary use case would be `COUNT` or `AVG`
   
   Edit: yeah or similulating count through sum, as I am seeing in your example.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] Dandandan commented on issue #272: Support for aggregate function with filter

Posted by GitBox <gi...@apache.org>.

Dandandan commented on issue #272:
URL: https://github.com/apache/arrow-datafusion/issues/272#issuecomment-833826346


   @alamb yes, that's a great one - isn't that only feasible with `SUM`/`MAX`/`MIN`? I think primary use case would be `COUNT` or `AVG`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #272: Support for aggregate function with filter

Posted by GitBox <gi...@apache.org>.

Dandandan edited a comment on issue #272:
URL: https://github.com/apache/arrow-datafusion/issues/272#issuecomment-833826346


   @alamb yes, that's a great one - isn't that only feasible with `SUM`/`MAX`/`MIN`? I think primary use case would be `COUNT` or `AVG`
   
   Edit: yeah or similulating count through sum, as I am seeing in your example. You could do something in the same way for `AVG`  of course if you are feeling creative :D


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on issue #272: Support for aggregate function with filter

Posted by GitBox <gi...@apache.org>.

alamb commented on issue #272:
URL: https://github.com/apache/arrow-datafusion/issues/272#issuecomment-833757037


   @Jimexist if you are proposing supporting this kind of syntax, then I think it is a reasonable idea. 
   
   ```
   SELECT
       count(*) AS unfiltered,
       count(*) FILTER (WHERE i < 5) AS filtered
   FROM generate_series(1,10) AS s(i);
    unfiltered | filtered
   ------------+----------
            10 |        4
   (1 row)
   ```
   
   For anyone else who might read this ticket, You can express the same type of query using SQL that data fusion already supports via CASE, though it is much less beautiful. Something like:
   
   ```shell
   $ seq 1 10 > /tmp/foo.csv
   $ cargo run -p datafusion-cli
   ```
   
   ```
   > CREATE EXTERNAL TABLE t(i double) STORED AS CSV LOCATION '/tmp/foo.csv';
   
   > select * from t;
   +----+
   | i  |
   +----+
   | 1  |
   | 2  |
   | 3  |
   | 4  |
   | 5  |
   | 6  |
   | 7  |
   | 8  |
   | 9  |
   | 10 |
   +----+
   
   >    select count(*) as unfiltered, sum(case when i < 5 then 1 else 0 end) as filtered from t;
   +------------+----------+
   | unfiltered | filtered |
   +------------+----------+
   | 10         | 4        |
   +------------+----------+
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org