You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/05 09:40:35 UTC

[GitHub] [arrow-rs] HaoYang670 opened a new issue #1400: Interesting benchmark results of `min_max_helper`

HaoYang670 opened a new issue #1400:
URL: https://github.com/apache/arrow-rs/issues/1400


   **Describe your question**
   I find some interesting benchmark results when I try to speed up the function `min_max_help`. https://github.com/apache/arrow-rs/blob/master/arrow/src/compute/kernels/aggregate.rs#L115-L130
   
   The only thing that I rewrote is replacing `iter().fold()` by `iter().reduce()`:
   ```rust
       if null_count == 0 {
           // optimized path for arrays without null values
           m.iter()
               .reduce(|acc, item| if cmp(acc, item) { item } else { acc })
               .copied()
       } else {
           n = T::default_value();
           let mut has_value = false;
           for (i, item) in m.iter().enumerate() {
               if data.is_valid(i) && (!has_value || cmp(&n, item)) {
                   has_value = true;
                   n = *item
               }
           }
           Some(n)
       }
   ```
   
   Then I ran
   ```
   cargo bench min
   ```
   to find if there are any changes in performance.
   And I got the result:
   ```console
   min 512                 time:   [415.18 ns 415.66 ns 416.25 ns]                    
                           change: [-50.211% -50.109% -50.002%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 15 outliers among 100 measurements (15.00%)
     2 (2.00%) low mild
     7 (7.00%) high mild
     6 (6.00%) high severe
   
   min nulls 512           time:   [1.0308 us 1.0331 us 1.0356 us]                           
                           change: [+17.936% +18.114% +18.295%] (p = 0.00 < 0.05)
                           Performance has regressed.
   ```
   The results are a little unexpected. And I have 2 questions:
   1. Why does `min 512` have 50% performance improvement? I don't think `iter.reduce` is faster that `iter.fold`
   2. Why does `min nulls 512` become slower? The `null_count > 0` code block is not changed.
   
   Need your help!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] HaoYang670 edited a comment on issue #1400: Interesting benchmark results of `min_max_helper`

Posted by GitBox <gi...@apache.org>.
HaoYang670 edited a comment on issue #1400:
URL: https://github.com/apache/arrow-rs/issues/1400#issuecomment-1059886865


   Also, using `copy` or `reference` seems to impact the performance of null handling loop. I get the following benchmark result:
   1. using `m.iter().copied().reduce(...)`:
   ```console
   min nulls 512           time:   [866.42 ns 867.35 ns 868.33 ns]                           
                           change: [-1.1100% -0.9494% -0.8086%] (p = 0.00 < 0.05)
                           Change within noise threshold.
   Found 9 outliers among 100 measurements (9.00%)
     7 (7.00%) high mild
     2 (2.00%) high severe
   ```
   3. using `m.iter().reduce(...).copied()`
   ```console
   
   min nulls 512           time:   [1.0525 us 1.0551 us 1.0576 us]                           
                           change: [+20.470% +20.793% +21.115%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   ```
   
   Is it able to reproduce these weird results on your laptop? @jhorstmann 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] HaoYang670 commented on issue #1400: Interesting benchmark results of `min_max_helper`

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on issue #1400:
URL: https://github.com/apache/arrow-rs/issues/1400#issuecomment-1059753488


   > Interesting, I'm actually seeing the opposite effect, with `fold` being faster. This is running on an Amd 3700U laptop. The generated code for `fold` and `reduce` seems to differ in that `reduce` contains several branches while `fold` looks relatively branchless. Different CPUs might handle one or the other better, although I would expect the branchless to win in the benchmark since the input consists of random numbers. I don't have any good explanation why this effects the null handling loop.
   > 
   > `
   
   More interesting! My desktop uses Intel i7 10700K processors. 
   Also, I am curious about why the compiler generates different code for `fold` and `reduce`. The `reduce` just uses `fold` in its implementation:
   ```rust
       #[inline]
       #[stable(feature = "iterator_fold_self", since = "1.51.0")]
       fn reduce<F>(mut self, f: F) -> Option<Self::Item>
       where
           Self: Sized,
           F: FnMut(Self::Item, Self::Item) -> Self::Item,
       {
           let first = self.next()?;
           Some(self.fold(first, f))
       }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on issue #1400: Interesting benchmark results of `min_max_helper`

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on issue #1400:
URL: https://github.com/apache/arrow-rs/issues/1400#issuecomment-1059762317


   Good point, I tried some other variations and it seems to come down to the handling of references vs copying. The following two variations lead to different code:
   
   ```
   pub fn min_reduce_ref(array: &[f64]) -> Option<f64> {
       array.iter().reduce(|a, b| if cmp(a, b) { b } else { a }).copied()
   }
   
   pub fn min_reduce_copied(array: &[f64]) -> Option<f64> {
       array.iter().copied().reduce(|a, b| if cmp(a, b) { b } else { a })
   }
   ```
   
   The second one generates the same code that I saw for `fold` earlier. In theory it should be possible that both get optimized to the same code and this might be limitation in LLVM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on issue #1400: Interesting benchmark results of `min_max_helper`

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on issue #1400:
URL: https://github.com/apache/arrow-rs/issues/1400#issuecomment-1059743980


   Interesting, I'm actually seeing the opposite effect, with `fold` being faster. This is running on an Amd 3700U laptop. The generated code for `fold` and `reduce` seems to differ in that `reduce` contains several branches while `fold` looks relatively branchless. Different CPUs might handle one or the other better, although I would expect the branchless to win in the benchmark since the input consists of random numbers. I don't have any good explanation why this effects the null handling loop.
   
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] HaoYang670 commented on issue #1400: Interesting benchmark results of `min_max_helper`

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on issue #1400:
URL: https://github.com/apache/arrow-rs/issues/1400#issuecomment-1059951540


   > I'm now actually a bit worried about the correctness of the nullable version, I don't see `has_value` being used apart from the assignment, if that flag is false then the result should be `None`.
   
   It can be made sure that there is at least one valid value in the array. Because we have tested `all nulls array` before:
   ```rust
   if null_count == array.len() {
           return None;
   }
   ```
   https://github.com/apache/arrow-rs/blob/master/arrow/src/compute/kernels/aggregate.rs#L107-L109


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] HaoYang670 commented on issue #1400: Interesting benchmark results of `min_max_helper`

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on issue #1400:
URL: https://github.com/apache/arrow-rs/issues/1400#issuecomment-1059886865


   Also, using `copy` or `reference` seems to impact the performance of null handling loop. I get the following benchmark result:
   1. using `m.iter().copied().reduce(...)`:
   ```console
   min nulls 512           time:   [866.42 ns 867.35 ns 868.33 ns]                           
                           change: [-1.1100% -0.9494% -0.8086%] (p = 0.00 < 0.05)
                           Change within noise threshold.
   Found 9 outliers among 100 measurements (9.00%)
     7 (7.00%) high mild
     2 (2.00%) high severe
   ```
   3. using `m.iter().reduce(...).copied()`
   ```console
   
   min nulls 512           time:   [1.0525 us 1.0551 us 1.0576 us]                           
                           change: [+20.470% +20.793% +21.115%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   ```
   
   Can you reproduce these weird results on your laptop? @jhorstmann 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on issue #1400: Interesting benchmark results of `min_max_helper`

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on issue #1400:
URL: https://github.com/apache/arrow-rs/issues/1400#issuecomment-1059949067


   I can reproduce both results and it seems AMD cpus are better at handling one version of the code and Intel better at the other version. For the non-null benchmarks I get:
   
   Intel 10510U
   copy + reduce: ~ 950ns
   reduce references + copy: ~450ns
   
   Amd 3700U (timings fluctuate a bit more on this laptop)
   copy + reduce: ~745ns
   reduce references + copy: ~970ns
   
   For `min nulls` I don't really see significant differences. Code alignment might create some small differences in performance on some cpus in such microbenchmarks. For even better performance, especially for nullable arrays, you should look into enabling the `simd` feature 
   
   I'm now actually a bit worried about the correctness of the nullable version, I don't see `has_value` being used apart from the assignment, if that flag is false then the result should be `None`.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] HaoYang670 commented on issue #1400: Interesting benchmark results of `min_max_helper`

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on issue #1400:
URL: https://github.com/apache/arrow-rs/issues/1400#issuecomment-1059882916


   Strongly agree with you @jhorstmann. `fold` seems to copy all elements in the array, and `reduce` just takes the references. If the code is rewritten as
   ```rust
       if null_count == 0 {
           // optimized path for arrays without null values
           m.iter()
               .copied()
               .reduce(|acc, item| if cmp(&acc, &item) { item } else { acc })
   ```
   and I can get the benchmark result:
   ```console
   min 512                 time:   [833.73 ns 834.34 ns 835.06 ns]                     
                           change: [-0.2047% -0.0757% +0.0383%] (p = 0.24 > 0.05)
                           No change in performance detected.
   Found 5 outliers among 100 measurements (5.00%)
     4 (4.00%) high mild
     1 (1.00%) high severe
   ```
   , which is same as `fold`.
   
   But wait! Which one is faster in this context, copy or reference? On my desktop, I can get 50% performance improvement by using reference. However, you said `fold` was faster on your laptop. Is it a little weird?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] HaoYang670 edited a comment on issue #1400: Interesting benchmark results of `min_max_helper`

Posted by GitBox <gi...@apache.org>.
HaoYang670 edited a comment on issue #1400:
URL: https://github.com/apache/arrow-rs/issues/1400#issuecomment-1059951540


   > I'm now actually a bit worried about the correctness of the nullable version, I don't see `has_value` being used apart from the assignment, if that flag is false then the result should be `None`.
   
   There is at least one valid value in the array. Because we have tested `all nulls array` before:
   ```rust
   if null_count == array.len() {
           return None;
   }
   ```
   https://github.com/apache/arrow-rs/blob/master/arrow/src/compute/kernels/aggregate.rs#L107-L109


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org