You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ashwin Shankar (JIRA)" <ji...@apache.org> on 2014/09/26 19:29:33 UTC

[jira] [Commented] (PIG-4203) Implement sparse JOIN on tables using bloom filter

    [ https://issues.apache.org/jira/browse/PIG-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14149677#comment-14149677 ] 

Ashwin Shankar commented on PIG-4203:
-------------------------------------

Working on this.

> Implement sparse JOIN on tables using bloom filter
> --------------------------------------------------
>
>                 Key: PIG-4203
>                 URL: https://issues.apache.org/jira/browse/PIG-4203
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ashwin Shankar
>
> Currently when users want to do a join on tables where one of the tables is sparse(ie only a small percentage of records match during join), they could use bloom filters to make the make join efficient(See PIG-2328).
> However this involves writing some code and calling couple of UDFs - BuildBloom,Bloom. 
> It would be great if building of bloom filters in these cases are automatically done ie Pig automatically inserts them into MR plan when users specify some keyword.
> Calling this keyword "sparse" if no one has any objections.
> Eg : C = JOIN A BY a1, B BY b1 USING 'sparse';  
> Assumption here is that table mentioned on the right side of join is the smaller table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)