You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/05 13:21:45 UTC

[GitHub] [arrow] Dandandan opened a new pull request #8837: ARROW-10810: [Rust] Speed up comparison kernels

Dandandan opened a new pull request #8837:
URL: https://github.com/apache/arrow/pull/8837


   This PR speeds up the (non-simd) comparison kernels by ~8 times:
   
   ```
   eq Float32              time:   [105.96 us 106.01 us 106.06 us]                       
                           change: [-82.463% -82.423% -82.378%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     1 (1.00%) low severe
     4 (4.00%) high mild
     3 (3.00%) high severe
   
   eq scalar Float32       time:   [61.439 us 61.530 us 61.662 us]                              
                           change: [-88.282% -88.255% -88.221%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     1 (1.00%) low mild
     4 (4.00%) high mild
     5 (5.00%) high severe
   
   neq Float32             time:   [71.018 us 71.080 us 71.144 us]                        
                           change: [-86.580% -86.563% -86.546%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 15 outliers among 100 measurements (15.00%)
     2 (2.00%) low severe
     4 (4.00%) low mild
     5 (5.00%) high mild
     4 (4.00%) high severe
   
   neq scalar Float32      time:   [68.706 us 68.773 us 68.838 us]                               
                           change: [-86.207% -86.188% -86.171%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   lt Float32              time:   [70.655 us 70.703 us 70.753 us]                       
                           change: [-85.629% -85.617% -85.604%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   lt scalar Float32       time:   [50.626 us 50.664 us 50.698 us]                               
                           change: [-89.802% -89.764% -89.731%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   lt_eq Float32           time:   [101.34 us 101.43 us 101.51 us]                          
                           change: [-82.825% -82.797% -82.767%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   lt_eq scalar Float32    time:   [68.894 us 68.913 us 68.933 us]                                 
                           change: [-86.575% -86.557% -86.538%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) low severe
     2 (2.00%) high mild
   
   gt Float32              time:   [71.260 us 71.332 us 71.400 us]                       
                           change: [-87.481% -87.450% -87.418%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   gt scalar Float32       time:   [38.852 us 38.888 us 38.929 us]                               
                           change: [-91.745% -91.733% -91.721%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     1 (1.00%) low severe
     11 (11.00%) high mild
     1 (1.00%) high severe
   
   gt_eq Float32           time:   [99.404 us 99.451 us 99.503 us]                          
                           change: [-80.870% -80.848% -80.827%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   gt_eq scalar Float32    time:   [55.892 us 55.926 us 55.963 us]                                 
                           change: [-88.783% -88.751% -88.727%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) low severe
     1 (1.00%) low mild
     2 (2.00%) high mild
     2 (2.00%) high severe
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan closed pull request #8837: ARROW-10810: [Rust] Speed up comparison kernels

Posted by GitBox <gi...@apache.org>.
Dandandan closed pull request #8837:
URL: https://github.com/apache/arrow/pull/8837


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8837: ARROW-10810: [Rust] Speed up comparison kernels

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8837:
URL: https://github.com/apache/arrow/pull/8837#issuecomment-739450290


   Let's close this for now in favor of #8842 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] removed a comment on pull request #8837: ARROW-10810: [Rust] Speed up comparison kernels

Posted by GitBox <gi...@apache.org>.
github-actions[bot] removed a comment on pull request #8837:
URL: https://github.com/apache/arrow/pull/8837#issuecomment-739250784


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8837: ARROW-10810: [Rust] Speed up comparison kernels

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8837:
URL: https://github.com/apache/arrow/pull/8837#issuecomment-739251667


   https://issues.apache.org/jira/browse/ARROW-10810


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #8837: ARROW-10810: [Rust] Speed up comparison kernels

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #8837:
URL: https://github.com/apache/arrow/pull/8837#discussion_r536902120



##########
File path: rust/arrow/src/compute/kernels/comparison.rs
##########
@@ -68,9 +77,18 @@ macro_rules! compare_op {
 macro_rules! compare_op_scalar {
     ($left: expr, $right:expr, $op:expr) => {{
         let null_bit_buffer = $left.data().null_buffer().cloned();
-        let mut result = BooleanBufferBuilder::new($left.len());
+        let byte_capacity = bit_util::ceil($left.len(), 8);
+        let actual_capacity = bit_util::round_upto_multiple_of_64(byte_capacity);
+        let mut buffer = MutableBuffer::new(actual_capacity);
+        buffer.resize(byte_capacity);
+
+        let data = buffer.raw_data_mut();
         for i in 0..$left.len() {
-            result.append($op($left.value(i), $right))?;
+            if $op($left.value(i), $right) {
+                unsafe {
+                    bit_util::set_bit_raw(data, i);
+                }
+            }

Review comment:
       Yes, would be nice to see if we can get the combination of a clean/safe API and good performance as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8837: ARROW-10810: [Rust] Speed up comparison kernels

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8837:
URL: https://github.com/apache/arrow/pull/8837#discussion_r536875918



##########
File path: rust/arrow/src/compute/kernels/comparison.rs
##########
@@ -68,9 +77,18 @@ macro_rules! compare_op {
 macro_rules! compare_op_scalar {
     ($left: expr, $right:expr, $op:expr) => {{
         let null_bit_buffer = $left.data().null_buffer().cloned();
-        let mut result = BooleanBufferBuilder::new($left.len());
+        let byte_capacity = bit_util::ceil($left.len(), 8);
+        let actual_capacity = bit_util::round_upto_multiple_of_64(byte_capacity);
+        let mut buffer = MutableBuffer::new(actual_capacity);
+        buffer.resize(byte_capacity);
+
+        let data = buffer.raw_data_mut();
         for i in 0..$left.len() {
-            result.append($op($left.value(i), $right))?;
+            if $op($left.value(i), $right) {
+                unsafe {
+                    bit_util::set_bit_raw(data, i);
+                }
+            }

Review comment:
       A side note: In general, if we are bypassing builders for speed, it can mean that builders are not a good abstraction, as they significantly impact performance.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8837: ARROW-10810: [Rust] Speed up comparison kernels

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8837:
URL: https://github.com/apache/arrow/pull/8837#issuecomment-739250784


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org