You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2015/01/19 11:24:35 UTC

[jira] [Commented] (TEZ-391) SharedEdge - Support for passing same output from a vertex as input to two different vertices

    [ https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282355#comment-14282355 ] 

Jeff Zhang commented on TEZ-391:
--------------------------------

Attach patch for SharedEdge
* Add a new api in Edge to create shared edge
{code}
public Edge createSharedEdge(Vertex outputVertex) 
{code}
* Currently it only support One-to-One and Broadcast (ScatterGather require the 2 downstream vertices has the same parallelism, otherwise shuffle will break. Although I did some change to make the ScatterGather work, but it still need more work, especially on the reducer auto-parallelism)
* Add one example in tez-example to show the usage. (SharedEdgeExample)

Although this patch works, after more thinking, I think using VertexGroup may be more natural and easy to understand. (We just need to make the 2 downstream vertices as a vertex group and connect the upstream vertex with this vertex group)  VertexGroup is now used for shared output, it is also natural to make it support for shared input. I will attach a new patch by using VertexGroup later.




> SharedEdge - Support for passing same output from a vertex as input to two different vertices
> ---------------------------------------------------------------------------------------------
>
>                 Key: TEZ-391
>                 URL: https://issues.apache.org/jira/browse/TEZ-391
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>            Assignee: Jeff Zhang
>         Attachments: TEZ-391-WIP-1.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)