You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2018/06/13 21:57:00 UTC

[jira] [Comment Edited] (PIG-5342) Add setting to turn off bloom join combiner

    [ https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511706#comment-16511706 ] 

Rohini Palaniswamy edited comment on PIG-5342 at 6/13/18 9:56 PM:
------------------------------------------------------------------

Comments:
1) Bloom join is also ideal in cases of right outer join with smaller dataset on the right which is not supported by replicated join.
2) edge.setCombinerInMap(true); and edge.setCombinerInReducer(true); is redundant.
3) edge.partitionerClass = BloomFilterPartitioner.class; should be only for the reducer case. Same for key and value types. 
4) resuleWithCombiner -> resultWithCombiner
5) Can avoid the new NullableTuple() in bloomWriter.write(new NullableIntWritable(i), new NullableTuple(tuple)); 


was (Author: rohini):
Comments:
1) Bloom join is also ideal in cases of right outer join with smaller dataset on the right which is not supported by replicated join.
2) edge.setCombinerInMap(true); and edge.setCombinerInReducer(true); is redundant.
3) edge.partitionerClass = BloomFilterPartitioner.class; should be only for the reducer case. Same for key and value types. 
4) combineBloomOp is not used anymore and should be removed.
5) resuleWithCombiner -> resultWithCombiner
6) Can avoid the new NullableTuple() in bloomWriter.write(new NullableIntWritable(i), new NullableTuple(tuple)); 

> Add setting to turn off bloom join combiner
> -------------------------------------------
>
>                 Key: PIG-5342
>                 URL: https://issues.apache.org/jira/browse/PIG-5342
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Major
>         Attachments: PIG-5342-1.patch
>
>
> 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom join. When the keys are all unique, the combiner is unnecessary overhead.
> 2) Mention in documentation that bloom join is also ideal in cases of right outer join with smaller dataset on the right. Replicate join only supports left outer join.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)