You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2015/05/05 00:13:07 UTC

[jira] [Comment Edited] (TEZ-2221) VertexGroup name should be unqiue

    [ https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527425#comment-14527425 ] 

Rohini Palaniswamy edited comment on TEZ-2221 at 5/4/15 10:12 PM:
------------------------------------------------------------------

bq. what happens if someone does the following. This should also be disallowed. Correct?
{code}
dag.createVertexGroup("group_1", v1,v2);
dag.createVertexGroup("group_2", v1,v2);
{code}

   [~daijy] pointed out this breaks a lot of Pig scripts on Tez with UnionOptimizer as we have multiple outputs from each vertex and we  create a vertex group for each of those output now.  For eg: union followed by order by. There will be one sample output and one partitioner output from the union vertex going to two different downstream vertices. With the UnionOptimizer, the union is removed and two vertex groups are created.  If this is disallowed we will have to reuse the same Vertex group to route multiple outputs. GroupInputEdge.create(VertexGroup inputVertexGroup, Vertex outputVertex, EdgeProperty edgeProperty, InputDescriptor mergedInput) API seem to allow that.  Will doing that work and that is how you want us to construct the plan?

Consider another case of union followed by replicate join with two tables followed by order by.  The plan will consist of 8 vertices - V1 (Load) + V2 (Load) + V3 (union) + V4 (Replicate join T1 load) + V5 (Replicate join T2 load) + V6 (partitioner) + V7 (sampler) + V8 (order by) with V1,V2->V3, V4->V3, V5->V3, V3->V6, V3->V7, V7->V6, V6->V8.  Optimized plan will become V4->(V1,V2 vertex group) , V5->(V1,V2 vertex group) , (V1,V2 vertex group) - > V6, (V1,V2 vertex group) - > V7, V7->V6, V6->V8. So using one vertex group for routing multiple outputs and multiple inputs is how we are expected to construct the plan? 




was (Author: rohini):
bq. what happens if someone does the following. This should also be disallowed. Correct?
{code}
dag.createVertexGroup("group_1", v1,v2);
dag.createVertexGroup("group_2", v1,v2);
{code}

   [~daijy] pointed out this breaks a lot of Pig scripts on Tez with UnionOptimizer as we have multiple outputs from each vertex and we  create a vertex group for each of those output now.  For eg: union followed by order by. There will be one sample output and one partitioner output from the union vertex going to two different downstream vertices. With the UnionOptimizer, the union is removed and two vertex groups are created.  If this is disallowed we will have to reuse the same Vertex group to route multiple outputs. GroupInputEdge.create(VertexGroup inputVertexGroup, Vertex outputVertex, EdgeProperty edgeProperty, InputDescriptor mergedInput) API seem to allow that.  Will doing that work and that is how you want us to construct the plan?

Consider another case of union followed by replicate join with two tables followed by order by.  The plan will consist of 8 vertices - V1 (Load) + V2 (Load) + V3 (union) + V4a (Replicate join T1 load) + V4b (Replicate join T2 load) + V5 (partitioner) + V6 (sampler) + V7 (order by) with V1,V2->V3, V4a->V3, V4b->V3, V4->V5, V4->V6, V6->V5, V5->V7.  Optimized plan will become V4a -> (V1,V2 vertex group) , V4b -> (V1,V2 vertex group) ,   (V1,V2 vertex group) -> V5, (V1,V2 vertex group) -> V6, V6->V5, V5->V7. So using one vertex group for routing multiple outputs and multiple inputs is how we are expected to construct the plan? 



> VertexGroup name should be unqiue
> ---------------------------------
>
>                 Key: TEZ-2221
>                 URL: https://issues.apache.org/jira/browse/TEZ-2221
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>             Fix For: 0.7.0, 0.5.4, 0.6.1
>
>         Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, TEZ-2221-4.patch
>
>
> VertexGroupCommitStartedEvent & VertexGroupCommitFinishedEvent use vertex group name to identify the vertex group commit, the same name of vertex group will conflict. While in the current equals & hashCode of VertexGroup, vertex group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)