You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Chengxiang Li (JIRA)" <ji...@apache.org> on 2015/06/18 11:03:01 UTC
[jira] [Created] (FLINK-2240) Use BloomFilter to minmize build side
records which spilled to disk in Hybrid-Hash-Join
Chengxiang Li created FLINK-2240:
------------------------------------
Summary: Use BloomFilter to minmize build side records which spilled to disk in Hybrid-Hash-Join
Key: FLINK-2240
URL: https://issues.apache.org/jira/browse/FLINK-2240
Project: Flink
Issue Type: Improvement
Components: Core
Reporter: Chengxiang Li
Priority: Minor
In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data would be spilled to disk, and the counterpart partition of big table data would be spilled to disk in probe phase as well. If we build a BloomFilter while spill small table to disk during build phase, and use it to filter the big table records which tend to be spilled to disk, this may greatly reduce the spilled big table file size, and saved the disk IO cost for writing and further reading.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)