You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Yitong Zhou (JIRA)" <ji...@apache.org> on 2015/03/18 22:34:38 UTC

[jira] [Created] (HADOOP-11727) Make org.hadoop.util.bloom.BloomFilter returns the expected false positive probability

Yitong Zhou created HADOOP-11727:
------------------------------------

             Summary: Make org.hadoop.util.bloom.BloomFilter returns the expected false positive probability
                 Key: HADOOP-11727
                 URL: https://issues.apache.org/jira/browse/HADOOP-11727
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Yitong Zhou


When bloom filtering, sometimes it would be handy to know the current expected false positive rate (bitSet's cardinality / vector size)^(# of hash functions), so that when the FP rate is too high, we can choose to rebuild the bloomfilter into a larger size.

The codes would look like this:
{code}
  /*
   * Returns the expected false positive probability of the current filter.
   *
   * @return The expected false positive probability
   */
  public double expectedFalsePositiveProbability() {
    return Math.pow((double) bits.cardinality() / vectorSize, nbHash);
  }
{code}

Does this sound like a reasonable minor function that could be added into the code base?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)