You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Claude Warren <cl...@xenei.com> on 2020/01/23 20:19:39 UTC

[collections] Bloom filter signature calculation

The HashFunctionIdentity.getSignature() method is intended to be used as in
a quick comparison of a HashFunctionIdentities.  As such it is supposed to
encompass the name, signedness and process as well as some indication that
the function implementation is the same as any other implementation of the
same function.  To do this is calls the hashing function (apply()) with a
seed of 0 (zero).

There was recent work on the commons-codec Murmur3 implementations that
dealt with sign extension errors in the seed.  In light of this, I wonder
if the seed for the signature shouldn't be negative rather than zero as
this may have a higher probability of exposing implementation issues.  But
then I wonder if I am tainted by the Murmur3 issue.

The apply() method is supposed  to be implemented as:

 apply( String.format( "%s-%s-%s", getName().toUpperCase( Locale.ROOT ),
getSignedness(), getProcess() ).getBytes( "UTF-8" ), 0 );

Thoughts?

Claude

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren