You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "allenma (via GitHub)" <gi...@apache.org> on 2023/04/11 04:53:24 UTC

[GitHub] [arrow-datafusion] allenma commented on pull request #5939: Count distinct support multiple expressions

allenma commented on PR #5939:
URL: https://github.com/apache/arrow-datafusion/pull/5939#issuecomment-1502685853

   @Dandandan @ozankabak , I did the benchmark with the new implementation, actually there is little performance downgrade:
   ```
   Benchmarking aggregate_query_no_group_by_count_distinct_wide: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 61.0s, or reduce sample count to 10.
   aggregate_query_no_group_by_count_distinct_wide
                           time:   [587.01 ms 598.43 ms 611.68 ms]
                           change: [-6.8593% -3.2992% +0.1757%] (p = 0.08 > 0.05)
                           No change in performance detected.
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
   
   Benchmarking aggregate_query_no_group_by_count_distinct_narrow: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 40.1s, or reduce sample count to 10.
   aggregate_query_no_group_by_count_distinct_narrow
                           time:   [399.48 ms 415.63 ms 438.63 ms]
                           change: [-1.1234% +3.7277% +10.592%] (p = 0.20 > 0.05)
                           No change in performance detected.
   Found 5 outliers among 100 measurements (5.00%)
     2 (2.00%) high mild
     3 (3.00%) high severe
   ```
   I increase the test array size from 65536 to 134_217_728 to reduce the env noise, and the benchmark command is:
    cargo bench --bench aggregate_query_sql
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org