You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/01/17 16:59:00 UTC

[jira] [Commented] (ASTERIXDB-2243) Bloomfilter size is overly calculated for update-heavy workloads

    [ https://issues.apache.org/jira/browse/ASTERIXDB-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329002#comment-16329002 ] 

ASF subversion and git services commented on ASTERIXDB-2243:
------------------------------------------------------------

Commit e54115d7f264ca102fef06578989b6285b35226a in asterixdb's branch refs/heads/master from [~luochen01]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=e54115d ]

[ASTERIXDB-2243][STO] Fix BloomFilter size estimation

- user model changes: no
- storage format changes: no
- interface changes: no

Details:
- Fix the bloom filter size estimation by using the
actual number of elements after bulk loading. This prevents
the bloom filter size grows larger and large under an update
heavy workloads, where most of ingested records are deleted
through merge.

Change-Id: Ib4054797d969efcfceb86f91b5321d34480e25c3
Reviewed-on: https://asterix-gerrit.ics.uci.edu/2285
Sonar-Qube: Jenkins <je...@fulliautomatix.ics.uci.edu>
Reviewed-by: Michael Blow <mb...@apache.org>
Integration-Tests: Jenkins <je...@fulliautomatix.ics.uci.edu>
Tested-by: Jenkins <je...@fulliautomatix.ics.uci.edu>
Contrib: Jenkins <je...@fulliautomatix.ics.uci.edu>


> Bloomfilter size is overly calculated for update-heavy workloads
> ----------------------------------------------------------------
>
>                 Key: ASTERIXDB-2243
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2243
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: STO - Storage
>            Reporter: Chen Luo
>            Assignee: Chen Luo
>            Priority: Major
>
> The current bloom filter calculation assumes the data is append-only without updates. Each bloom filter maintains the number of elements. When bulkload a new bloom filter through merge, the new size is simply the sum of all sizes. However, in a update-heavy workloads, even though the actual size of the merged disk component does not increase, the estimated bloom filter size will keep increasing and consume too much space.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)