You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "ozgrakkurt (via GitHub)" <gi...@apache.org> on 2023/06/21 18:35:32 UTC

[GitHub] [arrow-rs] ozgrakkurt opened a new pull request, #4441: use optimized bloom filter

ozgrakkurt opened a new pull request, #4441:
URL: https://github.com/apache/arrow-rs/pull/4441

   Closes #4213 
   
   Uses bloom filter implementation from a different crate. This crate uses CPU-specific optimized implementations so it is faster than the old implementation. [benchmarks are on this repo](https://github.com/ozgrakkurt/sbbf-rs)
   
   I am the author of the library that this PR uses.
   
   There are no user-facing changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1609749650

   Also compiler can't enable avx unless compiled with `target-cpu=native` which is very unlikely in cloud situations. But can do pretty well with aarch64 and sse I think
   
   <details>
   
   <summary>benchmarks on 4 AMD EPYC cores</summary>
   
   parquet2 insert         time:   [21.260 ns 21.743 ns 22.242 ns]
                           change: [-6.7189% -4.0742% -1.4153%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     3 (3.00%) high mild
     2 (2.00%) high severe
   
   parquet insert          time:   [14.298 ns 14.501 ns 14.721 ns]
                           change: [-0.4325% +1.8520% +4.1458%] (p = 0.12 > 0.05)
                           No change in performance detected.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   sbbf-rs insert          time:   [3.5763 ns 3.6755 ns 3.8263 ns]
                           change: [-0.4673% +3.0805% +7.1155%] (p = 0.11 > 0.05)
                           No change in performance detected.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   parquet2 contains       time:   [6.1688 ns 6.2463 ns 6.3320 ns]
                           change: [-0.4251% +1.4917% +3.4157%] (p = 0.12 > 0.05)
                           No change in performance detected.
   Found 5 outliers among 100 measurements (5.00%)
     4 (4.00%) high mild
     1 (1.00%) high severe
   
   parquet contains        time:   [6.4125 ns 6.5060 ns 6.6049 ns]
                           change: [-1.6971% +0.1722% +1.9752%] (p = 0.86 > 0.05)
                           No change in performance detected.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   sbbf-rs contains        time:   [2.5523 ns 2.6273 ns 2.7069 ns]
                           change: [-3.0901% +1.4871% +6.1788%] (p = 0.53 > 0.05)
                           No change in performance detected.
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) high mild
   
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1613541725

   > I'm somewhat apprehensive about making our default impl be based off a crate with an unclear long-term maintenance story.
   
   I am likewise similarly apprehensive
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on a diff in pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on code in PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#discussion_r1238677101


##########
parquet/Cargo.toml:
##########
@@ -64,8 +64,9 @@ seq-macro = { version = "0.3", default-features = false }
 futures = { version = "0.3", default-features = false, features = ["std"], optional = true }
 tokio = { version = "1.0", optional = true, default-features = false, features = ["macros", "rt", "io-util"] }
 hashbrown = { version = "0.14", default-features = false }
-twox-hash = { version = "1.6", default-features = false }
+xxhash-rust = { version = "0.8", features = ["xxh64"] }
 paste = { version = "1.0" }
+sbbf-rs-safe = "0.2"

Review Comment:
   I would like to keep it separate since it can be used in a lot of other applications, there are a lot of rust libraries for rust but none of them implement split block variants afaik



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1603260786

   I fixed the compilation issue on wasm, fixed fmt but not sure about the illegal instruction error on mac CI, maybe it says it is `aarch64` but doesn't have `Neon` instructions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on a diff in pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on code in PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#discussion_r1244970105


##########
parquet/src/bloom_filter/mod.rs:
##########
@@ -27,111 +27,19 @@ use crate::format::{
     SplitBlockAlgorithm, Uncompressed, XxHash,
 };
 use bytes::{Buf, Bytes};
-use std::hash::Hasher;
+use sbbf_rs_safe::Filter;
 use std::io::Write;
 use std::sync::Arc;
 use thrift::protocol::{
     TCompactInputProtocol, TCompactOutputProtocol, TOutputProtocol, TSerializable,
 };
-use twox_hash::XxHash64;
-
-/// Salt as defined in the [spec](https://github.com/apache/parquet-format/blob/master/BloomFilter.md#technical-approach).
-const SALT: [u32; 8] = [
-    0x47b6137b_u32,
-    0x44974d91_u32,
-    0x8824ad5b_u32,
-    0xa2b7289d_u32,
-    0x705495c7_u32,
-    0x2df1424b_u32,
-    0x9efc4947_u32,
-    0x5c6bfb31_u32,
-];
-
-/// Each block is 256 bits, broken up into eight contiguous "words", each consisting of 32 bits.
-/// Each word is thought of as an array of bits; each bit is either "set" or "not set".
-#[derive(Debug, Copy, Clone)]
-struct Block([u32; 8]);
-impl Block {
-    const ZERO: Block = Block([0; 8]);
-
-    /// takes as its argument a single unsigned 32-bit integer and returns a block in which each
-    /// word has exactly one bit set.
-    fn mask(x: u32) -> Self {
-        let mut result = [0_u32; 8];
-        for i in 0..8 {
-            // wrapping instead of checking for overflow
-            let y = x.wrapping_mul(SALT[i]);
-            let y = y >> 27;
-            result[i] = 1 << y;
-        }
-        Self(result)
-    }
-
-    #[inline]
-    #[cfg(target_endian = "little")]
-    fn to_le_bytes(self) -> [u8; 32] {
-        self.to_ne_bytes()
-    }
-
-    #[inline]
-    #[cfg(not(target_endian = "little"))]
-    fn to_le_bytes(self) -> [u8; 32] {
-        self.swap_bytes().to_ne_bytes()
-    }
-
-    #[inline]
-    fn to_ne_bytes(self) -> [u8; 32] {
-        unsafe { std::mem::transmute(self) }
-    }
-
-    #[inline]
-    #[cfg(not(target_endian = "little"))]
-    fn swap_bytes(mut self) -> Self {
-        self.0.iter_mut().for_each(|x| *x = x.swap_bytes());
-        self
-    }
-
-    /// setting every bit in the block that was also set in the result from mask
-    fn insert(&mut self, hash: u32) {
-        let mask = Self::mask(hash);
-        for i in 0..8 {
-            self[i] |= mask[i];
-        }
-    }
-
-    /// returns true when every bit that is set in the result of mask is also set in the block.
-    fn check(&self, hash: u32) -> bool {
-        let mask = Self::mask(hash);
-        for i in 0..8 {
-            if self[i] & mask[i] == 0 {
-                return false;
-            }
-        }
-        true
-    }
-}
-
-impl std::ops::Index<usize> for Block {
-    type Output = u32;
-
-    #[inline]
-    fn index(&self, index: usize) -> &Self::Output {
-        self.0.index(index)
-    }
-}
-
-impl std::ops::IndexMut<usize> for Block {
-    #[inline]
-    fn index_mut(&mut self, index: usize) -> &mut Self::Output {
-        self.0.index_mut(index)
-    }
-}
+use xxhash_rust::xxh64::xxh64;

Review Comment:
   also realised I don't use the sbbf filter properly in this PR branch as well, It allocates a new buffer when `Sbbf::new` is called, this shouldn't be the case ideally. It can just take borrow of the`bitset`, assuming it is aligned to 64 bits



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1610313921

   On `aarch64`
   
   <details>
   
   <summary>Running `cargo bench` under parquet folder, first with master branch and then sbbf branch</summary>
   
   ```
   Benchmarking write_batch primitive/4096 values primitive: Warming up for 3.0000 Benchmarking write_batch primitive/4096 values primitive: Collecting 100 sampleswrite_batch primitive/4096 values primitive
                           time:   [861.79 µs 873.72 µs 885.48 µs]
                           thrpt:  [198.69 MiB/s 201.36 MiB/s 204.15 MiB/s]
                    change:
                           time:   [+2.1325% +3.2924% +4.6073%] (p = 0.00 < 0.05)
                           thrpt:  [-4.4044% -3.1875% -2.0880%]
                           Performance has regressed.
   Benchmarking write_batch primitive/4096 values primitive with bloom filter: WarmBenchmarking write_batch primitive/4096 values primitive with bloom filter: CollBenchmarking write_batch primitive/4096 values primitive with bloom filter: Analwrite_batch primitive/4096 values primitive with bloom filter
                           time:   [3.8160 ms 3.8739 ms 3.9339 ms]
                           thrpt:  [44.722 MiB/s 45.415 MiB/s 46.105 MiB/s]
                    change:
                           time:   [-32.210% -31.013% -29.875%] (p = 0.00 < 0.05)
                           thrpt:  [+42.603% +44.955% +47.515%]
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   Benchmarking write_batch primitive/4096 values primitive non-null: Warming up foBenchmarking write_batch primitive/4096 values primitive non-null: Collecting 10write_batch primitive/4096 values primitive non-null
                           time:   [734.01 µs 739.56 µs 746.42 µs]
                           thrpt:  [231.12 MiB/s 233.27 MiB/s 235.03 MiB/s]
                    change:
                           time:   [+2.7312% +3.4277% +4.2323%] (p = 0.00 < 0.05)
                           thrpt:  [-4.0604% -3.3141% -2.6586%]
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   Benchmarking write_batch primitive/4096 values primitive non-null with bloom filBenchmarking write_batch primitive/4096 values primitive non-null with bloom filBenchmarking write_batch primitive/4096 values primitive non-null with bloom filBenchmarking write_batch primitive/4096 values primitive non-null with bloom filwrite_batch primitive/4096 values primitive non-null with bloom filter
                           time:   [3.3841 ms 3.4255 ms 3.4691 ms]
                           thrpt:  [49.729 MiB/s 50.362 MiB/s 50.978 MiB/s]
                    change:
                           time:   [-39.074% -38.127% -37.236%] (p = 0.00 < 0.05)
                           thrpt:  [+59.327% +61.621% +64.133%]
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     6 (6.00%) high mild
   Benchmarking write_batch primitive/4096 values bool: Collecting 100 samples in estimated 5.3174 s (write_batch primitive/4096 values bool
                           time:   [98.046 µs 98.965 µs 100.33 µs]
                           thrpt:  [10.570 MiB/s 10.716 MiB/s 10.816 MiB/s]
                    change:
                           time:   [+2.9694% +3.9493% +5.1195%] (p = 0.00 < 0.05)
                           thrpt:  [-4.8702% -3.7992% -2.8837%]
                           Performance has regressed.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   Benchmarking write_batch primitive/4096 values bool non-null: Collecting 100 samples in estimated 5write_batch primitive/4096 values bool non-null
                           time:   [78.075 µs 79.162 µs 80.149 µs]
                           thrpt:  [7.1393 MiB/s 7.2282 MiB/s 7.3289 MiB/s]
                    change:
                           time:   [+0.0433% +2.4260% +5.4192%] (p = 0.07 > 0.05)
                           thrpt:  [-5.1406% -2.3685% -0.0432%]
                           No change in performance detected.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   Benchmarking write_batch primitive/4096 values string: Collecting 100 samples in estimated 6.2646 swrite_batch primitive/4096 values string
                           time:   [403.61 µs 405.73 µs 408.72 µs]
                           thrpt:  [194.39 MiB/s 195.83 MiB/s 196.85 MiB/s]
                    change:
                           time:   [+1.4492% +2.3498% +3.5335%] (p = 0.00 < 0.05)
                           thrpt:  [-3.4129% -2.2959% -1.4285%]
                           Performance has regressed.
   Found 12 outliers among 100 measurements (12.00%)
     8 (8.00%) high mild
     4 (4.00%) high severe
   Benchmarking write_batch primitive/4096 values string with bloom filter: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60.
   Benchmarking write_batch primitive/4096 values string with bloom filter: Collecting 100 samples in write_batch primitive/4096 values string with bloom filter
                           time:   [1.1055 ms 1.1382 ms 1.1729 ms]
                           thrpt:  [67.741 MiB/s 69.804 MiB/s 71.873 MiB/s]
                    change:
                           time:   [-37.247% -35.553% -33.810%] (p = 0.00 < 0.05)
                           thrpt:  [+51.079% +55.167% +59.354%]
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   Benchmarking write_batch primitive/4096 values string dictionary: Collecting 100 samples in estimatwrite_batch primitive/4096 values string dictionary
                           time:   [229.68 µs 230.35 µs 231.20 µs]
                           thrpt:  [207.00 MiB/s 207.77 MiB/s 208.37 MiB/s]
                    change:
                           time:   [+0.2089% +1.0188% +2.0040%] (p = 0.02 < 0.05)
                           thrpt:  [-1.9646% -1.0085% -0.2085%]
                           Change within noise threshold.
   Found 13 outliers among 100 measurements (13.00%)
     4 (4.00%) high mild
     9 (9.00%) high severe
   Benchmarking write_batch primitive/4096 values string dictionary with bloom filter: Warming up for Benchmarking write_batch primitive/4096 values string dictionary with bloom filter: Collecting 100 write_batch primitive/4096 values string dictionary with bloom filter
                           time:   [535.90 µs 545.12 µs 555.73 µs]
                           thrpt:  [86.120 MiB/s 87.796 MiB/s 89.305 MiB/s]
                    change:
                           time:   [-42.792% -40.776% -38.804%] (p = 0.00 < 0.05)
                           thrpt:  [+63.410% +68.850% +74.801%]
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     3 (3.00%) high mild
     4 (4.00%) high severe
   Benchmarking write_batch primitive/4096 values string non-null: Collecting 100 samples in estimatedwrite_batch primitive/4096 values string non-null
                           time:   [483.04 µs 485.05 µs 487.48 µs]
                           thrpt:  [160.98 MiB/s 161.79 MiB/s 162.46 MiB/s]
                    change:
                           time:   [-2.2902% -1.5725% -0.8595%] (p = 0.00 < 0.05)
                           thrpt:  [+0.8669% +1.5976% +2.3439%]
                           Change within noise threshold.
   Found 16 outliers among 100 measurements (16.00%)
     6 (6.00%) high mild
     10 (10.00%) high severe
   Benchmarking write_batch primitive/4096 values string non-null with bloom filter: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.8s, enable flat sampling, or reduce sample count to 60.
   Benchmarking write_batch primitive/4096 values string non-null with bloom filter: Collecting 100 sawrite_batch primitive/4096 values string non-null with bloom filter
                           time:   [1.1462 ms 1.1802 ms 1.2178 ms]
                           thrpt:  [64.442 MiB/s 66.495 MiB/s 68.466 MiB/s]
                    change:
                           time:   [-39.081% -37.298% -35.342%] (p = 0.00 < 0.05)
                           thrpt:  [+54.661% +59.483% +64.152%]
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     7 (7.00%) high mild
     2 (2.00%) high severe
   
   Benchmarking write_batch nested/4096 values primitive list: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60.
   Benchmarking write_batch nested/4096 values primitive list: Collecting 100 samples in estimated 5.7write_batch nested/4096 values primitive list
                           time:   [1.1292 ms 1.1307 ms 1.1322 ms]
                           thrpt:  [144.23 MiB/s 144.42 MiB/s 144.60 MiB/s]
                    change:
                           time:   [-1.3286% -0.8860% -0.4412%] (p = 0.00 < 0.05)
                           thrpt:  [+0.4431% +0.8939% +1.3465%]
                           Change within noise threshold.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) low mild
     3 (3.00%) high mild
     2 (2.00%) high severe
   Benchmarking write_batch nested/4096 values primitive list non-null: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50.
   Benchmarking write_batch nested/4096 values primitive list non-null: Collecting 100 samples in estiwrite_batch nested/4096 values primitive list non-null
                           time:   [1.4193 ms 1.4229 ms 1.4272 ms]
                           thrpt:  [133.14 MiB/s 133.54 MiB/s 133.88 MiB/s]
                    change:
                           time:   [+4.4098% +4.7487% +5.1370%] (p = 0.00 < 0.05)
                           thrpt:  [-4.8860% -4.5335% -4.2235%]
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     3 (3.00%) low mild
     1 (1.00%) high mild
     3 (3.00%) high severe
   
   
   ```
   
   </details>
   
   So something like %35 gain across the board.
   I'll also test on x86 machine in a little bit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on a diff in pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on code in PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#discussion_r1244953147


##########
parquet/src/bloom_filter/mod.rs:
##########
@@ -27,111 +27,19 @@ use crate::format::{
     SplitBlockAlgorithm, Uncompressed, XxHash,
 };
 use bytes::{Buf, Bytes};
-use std::hash::Hasher;
+use sbbf_rs_safe::Filter;
 use std::io::Write;
 use std::sync::Arc;
 use thrift::protocol::{
     TCompactInputProtocol, TCompactOutputProtocol, TOutputProtocol, TSerializable,
 };
-use twox_hash::XxHash64;
-
-/// Salt as defined in the [spec](https://github.com/apache/parquet-format/blob/master/BloomFilter.md#technical-approach).
-const SALT: [u32; 8] = [
-    0x47b6137b_u32,
-    0x44974d91_u32,
-    0x8824ad5b_u32,
-    0xa2b7289d_u32,
-    0x705495c7_u32,
-    0x2df1424b_u32,
-    0x9efc4947_u32,
-    0x5c6bfb31_u32,
-];
-
-/// Each block is 256 bits, broken up into eight contiguous "words", each consisting of 32 bits.
-/// Each word is thought of as an array of bits; each bit is either "set" or "not set".
-#[derive(Debug, Copy, Clone)]
-struct Block([u32; 8]);
-impl Block {
-    const ZERO: Block = Block([0; 8]);
-
-    /// takes as its argument a single unsigned 32-bit integer and returns a block in which each
-    /// word has exactly one bit set.
-    fn mask(x: u32) -> Self {
-        let mut result = [0_u32; 8];
-        for i in 0..8 {
-            // wrapping instead of checking for overflow
-            let y = x.wrapping_mul(SALT[i]);
-            let y = y >> 27;
-            result[i] = 1 << y;
-        }
-        Self(result)
-    }
-
-    #[inline]
-    #[cfg(target_endian = "little")]
-    fn to_le_bytes(self) -> [u8; 32] {
-        self.to_ne_bytes()
-    }
-
-    #[inline]
-    #[cfg(not(target_endian = "little"))]
-    fn to_le_bytes(self) -> [u8; 32] {
-        self.swap_bytes().to_ne_bytes()
-    }
-
-    #[inline]
-    fn to_ne_bytes(self) -> [u8; 32] {
-        unsafe { std::mem::transmute(self) }
-    }
-
-    #[inline]
-    #[cfg(not(target_endian = "little"))]
-    fn swap_bytes(mut self) -> Self {
-        self.0.iter_mut().for_each(|x| *x = x.swap_bytes());
-        self
-    }
-
-    /// setting every bit in the block that was also set in the result from mask
-    fn insert(&mut self, hash: u32) {
-        let mask = Self::mask(hash);
-        for i in 0..8 {
-            self[i] |= mask[i];
-        }
-    }
-
-    /// returns true when every bit that is set in the result of mask is also set in the block.
-    fn check(&self, hash: u32) -> bool {
-        let mask = Self::mask(hash);
-        for i in 0..8 {
-            if self[i] & mask[i] == 0 {
-                return false;
-            }
-        }
-        true
-    }
-}
-
-impl std::ops::Index<usize> for Block {
-    type Output = u32;
-
-    #[inline]
-    fn index(&self, index: usize) -> &Self::Output {
-        self.0.index(index)
-    }
-}
-
-impl std::ops::IndexMut<usize> for Block {
-    #[inline]
-    fn index_mut(&mut self, index: usize) -> &mut Self::Output {
-        self.0.index_mut(index)
-    }
-}
+use xxhash_rust::xxh64::xxh64;

Review Comment:
   also in my `insert` bench I also call `contains` on `parquet` impl. But there is big diff in integration tests of writing parquet with bloom filters



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] jhorstmann commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "jhorstmann (via GitHub)" <gi...@apache.org>.
jhorstmann commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1609469322

   I would actually expect the existing code to auto-vectorize and generate nearly the same assembly as with using intrinsics. 
   
   Maybe `check` needs a tiny refactoring and not do that early exit though: https://rust.godbolt.org/z/GG6YqTena
   
   Are there more optimizations in the crate that I'm missing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4441: perf(parquet): use optimized bloom filter

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#discussion_r1238631719


##########
parquet/Cargo.toml:
##########
@@ -64,8 +64,9 @@ seq-macro = { version = "0.3", default-features = false }
 futures = { version = "0.3", default-features = false, features = ["std"], optional = true }
 tokio = { version = "1.0", optional = true, default-features = false, features = ["macros", "rt", "io-util"] }
 hashbrown = { version = "0.14", default-features = false }
-twox-hash = { version = "1.6", default-features = false }
+xxhash-rust = { version = "0.8", features = ["xxh64"] }
 paste = { version = "1.0" }
+sbbf-rs-safe = "0.2"

Review Comment:
   Rather than using a new crate (that would add a new dependency and need to be maintained, etc) what do you think about inlining your implementation into this repository?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1611239032

   > Can we also compare the current version to the fallback version in performance in `sbbf-rs-safe`? It looks like in `https://github.com/ozgrakkurt/sbbf-rs/blob/master/src/arch/fallback/parquet2_impl.rs` there are a couple of possible regressions, i.e. for `insert`
   
   The fallback version is basically `parquet2` in benchmarks I posted


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt closed pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt closed pull request #4441: perf(parquet): use optimized bloom filter
URL: https://github.com/apache/arrow-rs/pull/4441


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on a diff in pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on code in PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#discussion_r1238677101


##########
parquet/Cargo.toml:
##########
@@ -64,8 +64,9 @@ seq-macro = { version = "0.3", default-features = false }
 futures = { version = "0.3", default-features = false, features = ["std"], optional = true }
 tokio = { version = "1.0", optional = true, default-features = false, features = ["macros", "rt", "io-util"] }
 hashbrown = { version = "0.14", default-features = false }
-twox-hash = { version = "1.6", default-features = false }
+xxhash-rust = { version = "0.8", features = ["xxh64"] }
 paste = { version = "1.0" }
+sbbf-rs-safe = "0.2"

Review Comment:
   I would like to keep it separate since it can be used in a lot of other applications, there are a lot of bloom filter libraries for rust but none of them implement split block variants afaik



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1609930665

   > If autovectorization / avx is of concern we could also bring back feature detection using the `multiversion` crate.
   > 
   > I know also of a few (cloud) users are enabling a specific cpu target, e.g. by setting `target-cpu=skylake` etc as the instance type / cpu is often known in advance.
   
   With `target-cpu=native` on avx2 enabled cpu, library is 3x as fast on `insert` and 1.4x as fast on `contains` also library code is very easy to use without any requirement on cpu targets or anything like that


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on a diff in pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on code in PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#discussion_r1244951895


##########
parquet/src/bloom_filter/mod.rs:
##########
@@ -27,111 +27,19 @@ use crate::format::{
     SplitBlockAlgorithm, Uncompressed, XxHash,
 };
 use bytes::{Buf, Bytes};
-use std::hash::Hasher;
+use sbbf_rs_safe::Filter;
 use std::io::Write;
 use std::sync::Arc;
 use thrift::protocol::{
     TCompactInputProtocol, TCompactOutputProtocol, TOutputProtocol, TSerializable,
 };
-use twox_hash::XxHash64;
-
-/// Salt as defined in the [spec](https://github.com/apache/parquet-format/blob/master/BloomFilter.md#technical-approach).
-const SALT: [u32; 8] = [
-    0x47b6137b_u32,
-    0x44974d91_u32,
-    0x8824ad5b_u32,
-    0xa2b7289d_u32,
-    0x705495c7_u32,
-    0x2df1424b_u32,
-    0x9efc4947_u32,
-    0x5c6bfb31_u32,
-];
-
-/// Each block is 256 bits, broken up into eight contiguous "words", each consisting of 32 bits.
-/// Each word is thought of as an array of bits; each bit is either "set" or "not set".
-#[derive(Debug, Copy, Clone)]
-struct Block([u32; 8]);
-impl Block {
-    const ZERO: Block = Block([0; 8]);
-
-    /// takes as its argument a single unsigned 32-bit integer and returns a block in which each
-    /// word has exactly one bit set.
-    fn mask(x: u32) -> Self {
-        let mut result = [0_u32; 8];
-        for i in 0..8 {
-            // wrapping instead of checking for overflow
-            let y = x.wrapping_mul(SALT[i]);
-            let y = y >> 27;
-            result[i] = 1 << y;
-        }
-        Self(result)
-    }
-
-    #[inline]
-    #[cfg(target_endian = "little")]
-    fn to_le_bytes(self) -> [u8; 32] {
-        self.to_ne_bytes()
-    }
-
-    #[inline]
-    #[cfg(not(target_endian = "little"))]
-    fn to_le_bytes(self) -> [u8; 32] {
-        self.swap_bytes().to_ne_bytes()
-    }
-
-    #[inline]
-    fn to_ne_bytes(self) -> [u8; 32] {
-        unsafe { std::mem::transmute(self) }
-    }
-
-    #[inline]
-    #[cfg(not(target_endian = "little"))]
-    fn swap_bytes(mut self) -> Self {
-        self.0.iter_mut().for_each(|x| *x = x.swap_bytes());
-        self
-    }
-
-    /// setting every bit in the block that was also set in the result from mask
-    fn insert(&mut self, hash: u32) {
-        let mask = Self::mask(hash);
-        for i in 0..8 {
-            self[i] |= mask[i];
-        }
-    }
-
-    /// returns true when every bit that is set in the result of mask is also set in the block.
-    fn check(&self, hash: u32) -> bool {
-        let mask = Self::mask(hash);
-        for i in 0..8 {
-            if self[i] & mask[i] == 0 {
-                return false;
-            }
-        }
-        true
-    }
-}
-
-impl std::ops::Index<usize> for Block {
-    type Output = u32;
-
-    #[inline]
-    fn index(&self, index: usize) -> &Self::Output {
-        self.0.index(index)
-    }
-}
-
-impl std::ops::IndexMut<usize> for Block {
-    #[inline]
-    fn index_mut(&mut self, index: usize) -> &mut Self::Output {
-        self.0.index_mut(index)
-    }
-}
+use xxhash_rust::xxh64::xxh64;

Review Comment:
   yeah, forgot about that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1609483886

   I didn’t check assembly code myself. My implementation is adapted from the
   c impl of impala. There is a big difference is benchmarks so I don’t think
   it optimizes to same assembly especially for avx2
   
   On 27 Jun 2023 Tue at 19:07 Jörn Horstmann ***@***.***> wrote:
   
   > I would actually expect the existing code to auto-vectorize and generate
   > nearly the same assembly as with using intrinsics.
   >
   > Maybe check needs a tiny refactoring and not do that early exit though:
   > https://rust.godbolt.org/z/GG6YqTena
   >
   > Are there more optimizations in the crate that I'm missing?
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/arrow-rs/pull/4441#issuecomment-1609469322>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AV37FAYLPGLXN35YS7P6ERTXNLLIBANCNFSM6AAAAAAZPEUWEA>
   > .
   > You are receiving this because you authored the thread.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1611299727

   @Dandandan I updated fallback version and made a release, code is here: https://github.com/ozgrakkurt/sbbf-rs/blob/master/src/arch/fallback/parquet_impl.rs
   
   Not sure how it will perform since I modified it to fit the api of the library


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1604368062

   illegal instruction issue should be fixed, could you run the test again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1611324153

   > Thank you very much @ozgrakkurt I think it makes sense to use the crate as it seems we can't get similar performance relying on auto vectorization only in this case.
   
   no problem, thanks as well! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Dandandan commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1611323104

   Thank you very much  @ozgrakkurt I think it makes sense to use the crate as it seems we can't get similar performance relying on auto vectorization only in this case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1615851369

   > Having thought about this more, I think we need to make this an optional feature that people explicitly opt-in to
   
   done, can you check?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] jhorstmann commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "jhorstmann (via GitHub)" <gi...@apache.org>.
jhorstmann commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1610027588

   > I know also of a few (cloud) users are enabling a specific cpu target
   
   We certainly do, especially in a cloud or server side environment we want to make the best use of the available hardware. So our usecase does not benefit from runtime dispatching, but I understand this might not be the main usecase. I'm also a huge proponent of using the available instruction set, if necessary via intrinsics. In many cases though the compiler generates just as good assembly without specialized instructions. Sometimes this requires a bit of restructuring or careful use of unsafe, but simple loops like 0..8 usually work fine, and the rust code is then more maintainable than the intrinsics.
   
   I don't have a strong opinion on the bloom filter code, using a crate would also make sense instead of implementing in the arrow project itself. On the other hand, minimizing dependencies to a small, well-known set is also a good goal and something that some commercial users might care about.
   
   Regarding benchmarks, I think the existing `arrow_writer` benchmark includes code to enable bloom filters, it would be interesting to see the performance benefit in such a slightly bigger context.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1610347146

   about avx on distributed code: https://github.com/ClickHouse/ClickHouse/issues/40459


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Dandandan commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1609922563

   If autovectorization / avx is of concern we could also bring back feature detection using the `multiversion` crate. 
   
   I know also of a few (cloud) users are enabling a specific cpu target, e.g. by setting `target-cpu=skylake` etc as the instance type / cpu is often known in advance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Dandandan commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1604379166

   Rerunning


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1602762757

   cc @JayjeetAtGithub and @crepererum   (as I think you may be interested in this area of parquet)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1605454252

   I added CI for `aarch64` and `wasm`, so anyone can check it to verify it indeed works on all target platforms. [repo](https://github.com/ozgrakkurt/sbbf-rs)
   Also added an optimized `wasm` version that requires the nightly compiler and requires the user to manually enable the `simd128` feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] crepererum commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "crepererum (via GitHub)" <gi...@apache.org>.
crepererum commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1604096258

   The results [here](https://github.com/apache/arrow-rs/issues/4213#issuecomment-1599441632) look good and I think having this in a separate crate is totally OK. I this is really perf critical for anyone I could imagine that the impl. get reasonable complex (incl. assembly) which we probably don't wanna have in core parquet-rs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Dandandan commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1611232919

   Can we also compare the current version to the fallback version in performance in `sbbf-rs-safe`?
   It looks like in `https://github.com/ozgrakkurt/sbbf-rs/blob/master/src/arch/fallback/parquet2_impl.rs` there are a couple of possible regressions, i.e. for `insert`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #4441: perf(parquet): use optimized bloom filter

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#issuecomment-1613005438

   I'm somewhat apprehensive about making our default impl be based off a crate with an unclear long-term maintenance story. Perhaps we could feature gate this with it disabled by default to allow people to opt-in?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Dandandan commented on a diff in pull request #4441: perf(parquet): use optimized bloom filter

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan commented on code in PR #4441:
URL: https://github.com/apache/arrow-rs/pull/4441#discussion_r1244888913


##########
parquet/src/bloom_filter/mod.rs:
##########
@@ -27,111 +27,19 @@ use crate::format::{
     SplitBlockAlgorithm, Uncompressed, XxHash,
 };
 use bytes::{Buf, Bytes};
-use std::hash::Hasher;
+use sbbf_rs_safe::Filter;
 use std::io::Write;
 use std::sync::Arc;
 use thrift::protocol::{
     TCompactInputProtocol, TCompactOutputProtocol, TOutputProtocol, TSerializable,
 };
-use twox_hash::XxHash64;
-
-/// Salt as defined in the [spec](https://github.com/apache/parquet-format/blob/master/BloomFilter.md#technical-approach).
-const SALT: [u32; 8] = [
-    0x47b6137b_u32,
-    0x44974d91_u32,
-    0x8824ad5b_u32,
-    0xa2b7289d_u32,
-    0x705495c7_u32,
-    0x2df1424b_u32,
-    0x9efc4947_u32,
-    0x5c6bfb31_u32,
-];
-
-/// Each block is 256 bits, broken up into eight contiguous "words", each consisting of 32 bits.
-/// Each word is thought of as an array of bits; each bit is either "set" or "not set".
-#[derive(Debug, Copy, Clone)]
-struct Block([u32; 8]);
-impl Block {
-    const ZERO: Block = Block([0; 8]);
-
-    /// takes as its argument a single unsigned 32-bit integer and returns a block in which each
-    /// word has exactly one bit set.
-    fn mask(x: u32) -> Self {
-        let mut result = [0_u32; 8];
-        for i in 0..8 {
-            // wrapping instead of checking for overflow
-            let y = x.wrapping_mul(SALT[i]);
-            let y = y >> 27;
-            result[i] = 1 << y;
-        }
-        Self(result)
-    }
-
-    #[inline]
-    #[cfg(target_endian = "little")]
-    fn to_le_bytes(self) -> [u8; 32] {
-        self.to_ne_bytes()
-    }
-
-    #[inline]
-    #[cfg(not(target_endian = "little"))]
-    fn to_le_bytes(self) -> [u8; 32] {
-        self.swap_bytes().to_ne_bytes()
-    }
-
-    #[inline]
-    fn to_ne_bytes(self) -> [u8; 32] {
-        unsafe { std::mem::transmute(self) }
-    }
-
-    #[inline]
-    #[cfg(not(target_endian = "little"))]
-    fn swap_bytes(mut self) -> Self {
-        self.0.iter_mut().for_each(|x| *x = x.swap_bytes());
-        self
-    }
-
-    /// setting every bit in the block that was also set in the result from mask
-    fn insert(&mut self, hash: u32) {
-        let mask = Self::mask(hash);
-        for i in 0..8 {
-            self[i] |= mask[i];
-        }
-    }
-
-    /// returns true when every bit that is set in the result of mask is also set in the block.
-    fn check(&self, hash: u32) -> bool {
-        let mask = Self::mask(hash);
-        for i in 0..8 {
-            if self[i] & mask[i] == 0 {
-                return false;
-            }
-        }
-        true
-    }
-}
-
-impl std::ops::Index<usize> for Block {
-    type Output = u32;
-
-    #[inline]
-    fn index(&self, index: usize) -> &Self::Output {
-        self.0.index(index)
-    }
-}
-
-impl std::ops::IndexMut<usize> for Block {
-    #[inline]
-    fn index_mut(&mut self, index: usize) -> &mut Self::Output {
-        self.0.index_mut(index)
-    }
-}
+use xxhash_rust::xxh64::xxh64;

Review Comment:
   Looks like (OSX) this is also part of the speed up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org