You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2015/05/21 08:13:00 UTC

[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

    [ https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553693#comment-14553693 ] 

Jeff Zhang commented on TEZ-391:
--------------------------------

The following shows the different edge types we may need to support. 
|  | Vertex | VertexGroup |
| Vertex | Common Edge | SharedOutputEdge |
| VertexGroup | GroupInputEdge | Both SharedOutputEdge & GroupInputEdge (not implemented yet ) |

List several main changes of this patch
* Currently SharedOutputEdge only support One-to-One and Broadcast (ScatterGather require the 2 downstream vertices has the same parallelism, otherwise shuffle will break. Although I did some change to make the ScatterGather work, but it still need more work, especially on the reducer auto-parallelism) From the pig's usage scenario, One-to-One and broadcast should be sufficient now. 
* Work flow for shared output edge
** Specify the shared output edge when building DAG on client. 
** AM get the shared output edge from DAGPlan and pass this SharedOutputSpec through TaskSpec to TezChild
** LogicalIOProcessorRuntimeTask get the TaskSpec which contains the SharedOutputSpec. It would created corresponded SharedLogicOutput & SharedOutputContext which is very similar to common LogicOutput &  OutputContext. The only difference is that SharedLogicOutput & SharedOutputContext is associated with the downstream vertex group name rather than the downstream vertex name. The key thing here is that although we generate one copy of DatamovementEvent but we will send this one copy to each members of the downstream vertex group. (This is done in LogicalIOProcessorRuntimeTask.close())
* Refactor changes
** I rename lots of MergedInput to GroupedInput to make it align with SharedOutput
** Rename VertexImpl#sharedOutput to VertexImpl#mergedOutput 
 

> SharedEdge - Support for passing same output from a vertex as input to two different vertices
> ---------------------------------------------------------------------------------------------
>
>                 Key: TEZ-391
>                 URL: https://issues.apache.org/jira/browse/TEZ-391
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>            Assignee: Jeff Zhang
>         Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch, TEZ-391-WIP-4.patch, TEZ-391-WIP-5.patch, TEZ-391-WIP-6.patch, TEZ-391-WIP-7.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)