You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/06 18:25:32 UTC

[GitHub] [arrow-datafusion] alamb commented on issue #272: Support for aggregate function with filter

alamb commented on issue #272:
URL: https://github.com/apache/arrow-datafusion/issues/272#issuecomment-833757037


   @Jimexist if you are proposing supporting this kind of syntax, then I think it is a reasonable idea. 
   
   ```
   SELECT
       count(*) AS unfiltered,
       count(*) FILTER (WHERE i < 5) AS filtered
   FROM generate_series(1,10) AS s(i);
    unfiltered | filtered
   ------------+----------
            10 |        4
   (1 row)
   ```
   
   For anyone else who might read this ticket, You can express the same type of query using SQL that data fusion already supports via CASE, though it is much less beautiful. Something like:
   
   ```shell
   $ seq 1 10 > /tmp/foo.csv
   $ cargo run -p datafusion-cli
   ```
   
   ```
   > CREATE EXTERNAL TABLE t(i double) STORED AS CSV LOCATION '/tmp/foo.csv';
   
   > select * from t;
   +----+
   | i  |
   +----+
   | 1  |
   | 2  |
   | 3  |
   | 4  |
   | 5  |
   | 6  |
   | 7  |
   | 8  |
   | 9  |
   | 10 |
   +----+
   
   >    select count(*) as unfiltered, sum(case when i < 5 then 1 else 0 end) as filtered from t;
   +------------+----------+
   | unfiltered | filtered |
   +------------+----------+
   | 10         | 4        |
   +------------+----------+
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org