You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2017/06/15 03:29:00 UTC
[jira] [Created] (PIG-5260) Separate bloom filter for each reducer
of the join
Rohini Palaniswamy created PIG-5260:
---------------------------------------
Summary: Separate bloom filter for each reducer of the join
Key: PIG-5260
URL: https://issues.apache.org/jira/browse/PIG-5260
Project: Pig
Issue Type: New Feature
Reporter: Rohini Palaniswamy
Currently bloom join allows specifying the number of bloom filters and all of them are broadcast to each join vertex. The bloom filter partition logic is joinkey hashcode % num_filters. The reducer partition logic is joinkey hashcode % num_reducers. If we made the number of bloom filters equal to number of reducers in the join we can just broadcast bloom filter 0 to reducer 0, bloom filter 1 to reducer 1 and so on. one-one edge will most likely prevent auto-reduce parallelism from being applied for the scatter-gather edge. So need to see if we need a custom one-one broadcast edge for this.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)