You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Aleksey Ponkin (JIRA)" <ji...@apache.org> on 2016/11/04 07:58:58 UTC

[jira] [Updated] (SPARK-18252) Improve serialized BloomFilter size

     [ https://issues.apache.org/jira/browse/SPARK-18252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aleksey Ponkin updated SPARK-18252:
-----------------------------------
    Summary: Improve serialized BloomFilter size  (was: Compressed BloomFilters)

> Improve serialized BloomFilter size
> -----------------------------------
>
>                 Key: SPARK-18252
>                 URL: https://issues.apache.org/jira/browse/SPARK-18252
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.0.1
>            Reporter: Aleksey Ponkin
>            Priority: Minor
>
> Since version 2.0 Spark has BloomFilter implementation - org.apache.spark.util.sketch.BloomFilterImpl. I have noticed that current implementation are using custom class org.apache.spark.util.sketch.BitArray, which are allocating memory for filter in the begining. So even filters with small number of elements inserted will be preatty large when there will be a need of serialization. Is there any interest to use [https://github.com/RoaringBitmap/RoaringBitmap][RoaringBitmap] or [javaewah][https://github.com/lemire/javaewah] to compress bloom filters or may be compress them during serialization stage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org