You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by tu...@apache.org on 2023/06/27 17:03:49 UTC

[arrow-rs] branch master updated: Convince the compiler to auto-vectorize the range check in parquet DictionaryBuffer (#4453)

This is an automated email from the ASF dual-hosted git repository.

tustvold pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/master by this push:
     new c1656ffea Convince the compiler to auto-vectorize the range check in parquet DictionaryBuffer (#4453)
c1656ffea is described below

commit c1656ffea5bba726d7af892e013b6c5b184dd3b4
Author: Jörn Horstmann <gi...@jhorstmann.net>
AuthorDate: Tue Jun 27 19:03:43 2023 +0200

    Convince the compiler to auto-vectorize the range check in parquet DictionaryBuffer (#4453)
---
 parquet/src/arrow/buffer/dictionary_buffer.rs | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/parquet/src/arrow/buffer/dictionary_buffer.rs b/parquet/src/arrow/buffer/dictionary_buffer.rs
index 6344d9dd3..a0a47e3b9 100644
--- a/parquet/src/arrow/buffer/dictionary_buffer.rs
+++ b/parquet/src/arrow/buffer/dictionary_buffer.rs
@@ -152,8 +152,15 @@ impl<K: ScalarValue + ArrowNativeType + Ord, V: ScalarValue + OffsetSizeTrait>
                     let min = K::from_usize(0).unwrap();
                     let max = K::from_usize(values.len()).unwrap();
 
-                    // It may be possible to use SIMD here
-                    if keys.as_slice().iter().any(|x| *x < min || *x >= max) {
+                    // using copied and fold gets auto-vectorized since rust 1.70
+                    // all/any would allow early exit on invalid values
+                    // but in the happy case all values have to be checked anyway
+                    if !keys
+                        .as_slice()
+                        .iter()
+                        .copied()
+                        .fold(true, |a, x| a && x >= min && x < max)
+                    {
                         return Err(general_err!(
                             "dictionary key beyond bounds of dictionary: 0..{}",
                             values.len()