You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ashwin Shankar (JIRA)" <ji...@apache.org> on 2014/09/26 19:28:34 UTC

[jira] [Updated] (PIG-4203) Implement sparse JOIN on tables using bloom filter

     [ https://issues.apache.org/jira/browse/PIG-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashwin Shankar updated PIG-4203:
--------------------------------
    Summary: Implement sparse JOIN on tables using bloom filter  (was: Implement sparse JOIN on table using bloom filter)

> Implement sparse JOIN on tables using bloom filter
> --------------------------------------------------
>
>                 Key: PIG-4203
>                 URL: https://issues.apache.org/jira/browse/PIG-4203
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ashwin Shankar
>
> Currently when users want to do a join on tables where one of the tables is sparse(ie only a small percentage of records match during join), they could use bloom filters to make the make join efficient(See PIG-2328).
> However this involves writing some code and calling couple of UDFs - BuildBloom,Bloom. 
> It would be great if building of bloom filters in these cases are automatically done ie Pig automatically inserts them into MR plan when users specify some keyword.
> Calling this keyword "sparse" if no one has any objections.
> Eg : C = JOIN A BY a1, B BY b1 USING 'sparse';  
> Assumption here is that table mentioned on the right side of join is the smaller table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)