You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2015/02/13 15:04:11 UTC

[jira] [Created] (PIG-4420) Support for map side cross similar to replicate join

Rohini Palaniswamy created PIG-4420:
---------------------------------------

             Summary: Support for map side cross similar to replicate join
                 Key: PIG-4420
                 URL: https://issues.apache.org/jira/browse/PIG-4420
             Project: Pig
          Issue Type: New Feature
            Reporter: Rohini Palaniswamy


   Our CROSS implementation is very costly.  Recently had a case where a user was doing a CROSS of 30million records against 3K records and it caused lot of disk error exceptions during the shuffle phase. We need to add support for a map side cross syntax

C = CROSS A, B using 'replicate';

The smaller table can be loaded in a list (hashmap in replicate join) and iterated through for each record in the bigger table. It should give a major performance boost and drastically reduce the resource usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)