You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2014/03/14 20:24:42 UTC

[jira] [Commented] (TEZ-933) Race in getting source / destination numTasks on an Edge

    [ https://issues.apache.org/jira/browse/TEZ-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935491#comment-13935491 ] 

Siddharth Seth commented on TEZ-933:
------------------------------------

There's several options
1) Wait for all downstream vertices to finish initializing before starting a vertex. This has the downside of unnecessarily delaying many vertices, and also does not solve the problem of the case where parallelism is set after initialization.

2) For now this affects only the ScatterGatehrEdgeManager - since that's the only one which refers to numTasks. Just that could be changed - but that isn't a long lasting solution.

3) EdgeManagers should get access to the initial parallelism as well as the post initialization / post setParallism numTasks. This could either be changed 1. on the EdgeManager API by giving access to the initially configured numTasks as well as the current value, or by 2. setting the numTasks in the Vertex up front. 

For now I'm planning of going with option 3 (either 1 or 2) - since that's the most correct option (the edgeplugin eventually needs to decide what it wants to use).  [~hitesh], thoughts on this ?

> Race in getting source / destination numTasks on an Edge
> --------------------------------------------------------
>
>                 Key: TEZ-933
>                 URL: https://issues.apache.org/jira/browse/TEZ-933
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>
> Edges rely on getting properties (specifically numTasks in this case) from the source or destination vertex.
> This can end up with an incorrect value being used depending on the state of the vertex - whether the vertex has been initialized, whether the parallelism has been changed etc.
> As an example
> {code}
> edgeManager.getNumSourceTaskPhysicalOutputs(destinationVertex.getTotalTasks(), sourceTaskIndex))
> {code}
> destinationVertex.getTotalTasks() may be incorrect if the destinationVertex hasn't yet been initialized. Alternately, this value can change based on setParallelism calls.



--
This message was sent by Atlassian JIRA
(v6.2#6252)