You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ivan Balashov <ib...@iponweb.net> on 2009/06/05 14:31:03 UTC

Choice of Bloom Filter implementation in Hadoop application

Dear all,

As part of optimization process in our Hadoop application we're trying
to use Bloom filter in order not to pass needless records through to
the reduce stage.

We've noticed, that Hadoop dev team recently introduced the
implementation of BloomMapFile
(https://issues.apache.org/jira/browse/HADOOP-3063), primarily
intended for internal Hadoop use.

Our question now is could we use Hadoop Bloom filter implementation
for the filtering purposes of our application, or it is intended
solely for internal usage and it would be better for us to consider
other implementation?

Please let me know if I should provide more detail on this matter.

Thanks,


--
Kind regards,
Ivan