You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2014/03/28 00:05:16 UTC
[jira] [Created] (PIG-3847) Sort avoidance for group by and join
Rohini Palaniswamy created PIG-3847:
---------------------------------------
Summary: Sort avoidance for group by and join
Key: PIG-3847
URL: https://issues.apache.org/jira/browse/PIG-3847
Project: Pig
Issue Type: Sub-task
Reporter: Rohini Palaniswamy
Group by and join only require that the records be grouped together by key. It is not necessary for the keys to be sorted. If we can have a Tez Input/Output implementation that does the grouping using hashmap (memory, spilling, etc have to be handled) it could really speed up group by and join. Combiners on both input and output side can also be fast if serialization/deserialization is not required and that can be used instead of POPartialAgg.
--
This message was sent by Atlassian JIRA
(v6.2#6252)