You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ashwin Shankar (JIRA)" <ji...@apache.org> on 2014/09/26 19:28:34 UTC
[jira] [Updated] (PIG-4203) Implement sparse JOIN on tables using
bloom filter
[ https://issues.apache.org/jira/browse/PIG-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashwin Shankar updated PIG-4203:
--------------------------------
Summary: Implement sparse JOIN on tables using bloom filter (was: Implement sparse JOIN on table using bloom filter)
> Implement sparse JOIN on tables using bloom filter
> --------------------------------------------------
>
> Key: PIG-4203
> URL: https://issues.apache.org/jira/browse/PIG-4203
> Project: Pig
> Issue Type: New Feature
> Reporter: Ashwin Shankar
>
> Currently when users want to do a join on tables where one of the tables is sparse(ie only a small percentage of records match during join), they could use bloom filters to make the make join efficient(See PIG-2328).
> However this involves writing some code and calling couple of UDFs - BuildBloom,Bloom.
> It would be great if building of bloom filters in these cases are automatically done ie Pig automatically inserts them into MR plan when users specify some keyword.
> Calling this keyword "sparse" if no one has any objections.
> Eg : C = JOIN A BY a1, B BY b1 USING 'sparse';
> Assumption here is that table mentioned on the right side of join is the smaller table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)