You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Aleksey Ponkin (JIRA)" <ji...@apache.org> on 2016/11/03 13:58:58 UTC

[jira] [Created] (SPARK-18252) Compressed BloomFilters

Aleksey Ponkin created SPARK-18252:
--------------------------------------

             Summary: Compressed BloomFilters
                 Key: SPARK-18252
                 URL: https://issues.apache.org/jira/browse/SPARK-18252
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.0.1
            Reporter: Aleksey Ponkin
            Priority: Minor


Since version 2.0 Spark has BloomFilter implementation - org.apache.spark.util.sketch.BloomFilterImpl. I have noticed that current implementation are using custom class org.apache.spark.util.sketch.BitArray, which are allocating memory for filter in the begining. So even filters with small number of elements inserted will be preatty large when there will be a need of serialization. Is there any interest to use [https://github.com/RoaringBitmap/RoaringBitmap][RoaringBitmap] or [javaewah][https://github.com/lemire/javaewah] to compress bloom filters or may be compress them during serialization stage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org