You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2020/03/17 16:58:07 UTC

[GitHub] [commons-collections] aherbert commented on issue #131: Added caching hasher

aherbert commented on issue #131: Added caching hasher
URL: https://github.com/apache/commons-collections/pull/131#issuecomment-600184152
 
 
   This class makes a big assumption on how a CYCLIC HashFunction produces long output.
   
   If the function does not increment the first long output from the hash function (with seed = 0) using a fixed increment to produce all the other values then this CachingHasher will create different indexes from a DynamicHasher using the same CYCLIC hash function.
   
   For instance if the CYCLIC function creates further output using multiplications and rotations.
   
   Consequently you can add this CachingHasher to a BloomFilter that uses the same HashFunctionIdentity but the CachingHasher output will be inconsistent with items that may have already been added with a different hasher using the same HashFunction.
   
   Either:
   
   - You test the cyclic hash function by calling it once to get the increment (as is currently done) and then again to test if your prediction for the next output is correct based on the assumption of a fixed increment.
   - You make it clear in the javadoc that this function can output different indexes than a DynamicHasher when the cyclic function is not using a fixed increment.
   
   This is an example where the removal of the hashing functionality from within the BloomFilter can lead to a comprised internal state due to lack of encapsulation.
   
   If a cyclic hash function is mandated to use a fixed increment to generate subsequent hash values then this is not a problem. But it should be added to the ProcessType javadoc.
   
   I was interpreting the ProcessType as either you use the entire byte[] representation of the object for each seeded hash (iterative) or you do not (cyclic). There is nothing in there to indicate it uses summation. It currently states:
   ```
   Subsequent calls with a non-zero seed use the state to generate a new value.
   ```
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services