You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Stephan Ewen (JIRA)" <ji...@apache.org> on 2015/08/06 19:02:05 UTC

[jira] [Resolved] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

     [ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephan Ewen resolved FLINK-2240.
---------------------------------
       Resolution: Implemented
    Fix Version/s: 0.10

Implemented in 61dcae391cb3b45ba3aff47d4d9163889d2958a4

Thank you for the contribution!

> Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
> --------------------------------------------------------------------------------------------
>
>                 Key: FLINK-2240
>                 URL: https://issues.apache.org/jira/browse/FLINK-2240
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>            Priority: Minor
>             Fix For: 0.10
>
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data would be spilled to disk, and the counterpart partition of big table data would be spilled to disk in probe phase as well. If we build a BloomFilter while spill small table to disk during build phase, and use it to filter the big table records which tend to be spilled to disk, this may greatly  reduce the spilled big table file size, and saved the disk IO cost for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)