You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2014/01/29 13:00:13 UTC

[jira] [Created] (CASSANDRA-6633) Investigate Bloom Filter Improvements

Benedict created CASSANDRA-6633:
-----------------------------------

             Summary: Investigate Bloom Filter Improvements
                 Key: CASSANDRA-6633
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6633
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Benedict


Investigate various possible improvements to our bloom filters:

1) Dynamic resizing would be useful. There are a few ways this could be achieved: with some modification, downsampling could be supported; partitioning the hash functions so that we may select the number of hashes/bits dynamically, by loading/unloading a given partition; and there are some related data structures, such as Quotient Filters that support resizing and merging.
2) Faster loading: should be possible to mmap the bloom filter representation on disk
3) Most ambitious of all, would be to try to reduce the memory requirement of bloom filters. Golomb Coded Sets [1] are a possibility, as are other compressed hash structures.

[1|http://algo2.iti.kit.edu/singler/publications/cacheefficientbloomfilters-wea2007.pdf]
[2|http://www.it-c.dk/people/pagh/papers/bloom.pdf]





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)