You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Mark Wagner (JIRA)" <ji...@apache.org> on 2013/10/31 20:19:18 UTC

[jira] [Commented] (PIG-3555) Initial implementation of combiner optimization

    [ https://issues.apache.org/jira/browse/PIG-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810578#comment-13810578 ] 

Mark Wagner commented on PIG-3555:
----------------------------------

I think that's a good way to do it. One comment: Tez also does combiners as part of OnFileSortedOutput (like the traditional mapred combiners). I'd propose we create a new "TezEdge" to serve as a descriptor for edges, since this is likely an area where we'll be doing a lot of optimization in the future w/ Tez (Streaming edges, Shuffles with no sorting, etc.) and it would be good to have some separation from TezOp. Then every TezOperator can maintain knowledge of it's input and output TezEdges.

> Initial implementation of combiner optimization
> -----------------------------------------------
>
>                 Key: PIG-3555
>                 URL: https://issues.apache.org/jira/browse/PIG-3555
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>    Affects Versions: tez-branch
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: tez-branch
>
>
> To support algebraic UDFs and others, combiner is required. To start with, I am proposing the following initial implementation-
> * In Tez, combiner runs as part of ShuffledMergedInput in edges, so multiple combine plans (one per edge) need to be registered in a destination vertex. Each vertex is mapped to a TezOperator in Tez plan, so an array of combine plans will be stored in the TezOperator that maps to a destination vertex.
> * To register combine plans in a TezOperator, we will run a CombinerOptimizer on the Tez plan after TezCompiler generates it but before TezDagBuilder converts it into DAG.
> * Finally, TezDagBuilder will insert combine plans into the payload of ShuffledMergedInput while constructing a destination vertex.
> This initial implementation will allow us to run algebraic UDFs. In the future, we can implement more optimizations for limit, order-by, etc on top of this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)