You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2019/10/24 10:32:23 UTC

[GitHub] [commons-collections] aherbert commented on issue #83: WIP: Initial bloom filter code contribution

aherbert commented on issue #83: WIP: Initial bloom filter code contribution
URL: https://github.com/apache/commons-collections/pull/83#issuecomment-545856096
 
 
   I think the point is:
   
   ```
   "either test or set k bits using positions chosen using hash functions.
   ...
   if any of the bits are not set, the element certainly does not exist."
   ```
   
   So you can do the comparison with the first hashing function, then the next, etc, until you have processed all the hashing functions. At any point if a match is not found you can stop thus allowing early exit of your entire comparison.
   
   If the hashing function outputs a single bit position then I can see value in partial computation of the entire combined hash.
   
   So this is a question of whether part of the entire hash can be computed in a stage and used against the full bits of the Bloom filter. In your example for a filter of length 225,000 bits built using 16 hashing functions are any of these true:
   
   1. Each hashing function identifies only 1 bit
   2. Each hashing function identifies bits from a non-overlapping range of the 225,000 bits
   
   In the example below the hashing function identifies 2 bits. For target A you only need to check the first bit to know that the object is not in the filter. For target B you have to check both.
   
   ```
   Bloom filter:
   -----------1--------1------11---------1---------------11---------------1---
   
   Target A (not present):
   --1-----------------------------------------------------------------------1
   
   Target B (may be present):
   -----------1--------------------------1------------------------------------
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services