You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2014/03/03 23:04:22 UTC

[jira] [Updated] (PIG-3775) Use unsorted shuffle in Union, Orderby, Skewed Join to improve performance in Tez

     [ https://issues.apache.org/jira/browse/PIG-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy updated PIG-3775:
------------------------------------

    Summary: Use unsorted shuffle in Union, Orderby, Skewed Join to improve performance in Tez  (was: Use unsorted shuffle in Union, Orderby, Skewed Join to improve performance)

> Use unsorted shuffle in Union, Orderby, Skewed Join to improve performance in Tez
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-3775
>                 URL: https://issues.apache.org/jira/browse/PIG-3775
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>              Labels: gsoc2014
>             Fix For: tez-branch
>
>
> When implementing Pig union, we need to gather data from two or more upstream vertexes without sorting. The vertex itself might consists of several tasks. Same can be done for the partitioner vertex in orderby and skewed join instead of 1-1 edge for some cases of parallelism.
> TEZ-661 has been created to add custom output and input for that in Tez. It is currently not in the Tez team priorities but it is important for us as it will give good performance gains. We can write the custom input/output and contribute it to Tez and make the corresponding changes in Pig. Marking this as a candidate for GSOC 2014. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)