You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "ozgrakkurt (via GitHub)" <gi...@apache.org> on 2023/05/13 23:10:32 UTC

[GitHub] [arrow-rs] ozgrakkurt opened a new issue, #4213: Use optimized implementation of bloom filter

ozgrakkurt opened a new issue, #4213:
URL: https://github.com/apache/arrow-rs/issues/4213

   Hey!
   
   I implemented https://github.com/ozgrakkurt/sbbf-rs.
   
   It is an implementation of parquet bloom filters, I checked it against the implementation at `parquet2` and it produces same output.
   
   I would like to integrate it here if it makes sense to do.
   
   A big problem with it may be that it has different implementations including a `aarch64::neon` which wouldn't get tested on GitHub CI.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on issue #4213: Use optimized implementation of bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on issue #4213:
URL: https://github.com/apache/arrow-rs/issues/4213#issuecomment-1599441632

   Added `parquet` impl by copy-pasting the bloom_filter file
   
   results on mac m1:
   <img width="807" alt="image" src="https://github.com/apache/arrow-rs/assets/91746947/95a30cef-ad54-44c5-a218-22d050eeb36e">
   
   results on VPS (x86_64) with 1 gb ram and 1vcore cpu:
   <img width="807" alt="image" src="https://github.com/apache/arrow-rs/assets/91746947/5a0034ed-463a-4f29-8e65-fdcafa4b61cd">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on issue #4213: Use optimized implementation of bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on issue #4213:
URL: https://github.com/apache/arrow-rs/issues/4213#issuecomment-1546932021

   I added it to the benchmarks here: https://github.com/crepererum/pdatastructs.rs/pull/126.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ozgrakkurt commented on issue #4213: Use optimized implementation of bloom filter

Posted by "ozgrakkurt (via GitHub)" <gi...@apache.org>.
ozgrakkurt commented on issue #4213:
URL: https://github.com/apache/arrow-rs/issues/4213#issuecomment-1599377512

   @tustvold added benchmark against `parquet2` implementation [on the repo](https://github.com/ozgrakkurt/sbbf-rs) can you run it if you have time?
   
   I ran it on an aarch64 cpu and these are the results I get:
   ```
   INSERT
   parquet2 -> 76 ns
   sbbf-rs -> 3.5ns
   
   CONTAINS (this seems to be dominated by hashing time and dynamic dispatch in sbbf-rs, also seems like the compiler optimizes the code in parquet2 pretty well)
   parquet2 -> 3.4 ns
   sbbf-rs -> 2.8 ns
   
   CONTAINS (without hashing and dynamic-dispatch in sbbf-rs, requires modifying library a little specifically for aarch64 since all aarch64 cpus support `neon` SIMD instructions, don't want to release with this optimization since not sure if it is worth complicating the code)
   parquet2 -> 1.9 ns
   sbbf-rs -> 500 ps
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #4213: Use optimized implementation of bloom filter

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4213:
URL: https://github.com/apache/arrow-rs/issues/4213#issuecomment-1546928357

   Do you have any performance benchmark results you could share?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org