You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ashwin Shankar (JIRA)" <ji...@apache.org> on 2014/09/26 19:27:33 UTC

[jira] [Created] (PIG-4203) Implement sparse JOIN on table using bloom filter

Ashwin Shankar created PIG-4203:
-----------------------------------

             Summary: Implement sparse JOIN on table using bloom filter
                 Key: PIG-4203
                 URL: https://issues.apache.org/jira/browse/PIG-4203
             Project: Pig
          Issue Type: New Feature
            Reporter: Ashwin Shankar


Currently when users want to do a join on tables where one of the tables is sparse(ie only a small percentage of records match during join), they could use bloom filters to make the make join efficient(See PIG-2328).
However this involves writing some code and calling couple of UDFs - BuildBloom,Bloom. 
It would be great if building of bloom filters in these cases are automatically done ie Pig automatically inserts them into MR plan when users specify some keyword.
Calling this keyword "sparse" if no one has any objections.
Eg : C = JOIN A BY a1, B BY b1 USING 'sparse';  

Assumption here is that table mentioned on the right side of join is the smaller table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)