You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2015/07/30 23:50:05 UTC

[jira] [Comment Edited] (TEZ-2633) Allow VertexManagerPlugins to receive and report based on attempts instead of tasks

    [ https://issues.apache.org/jira/browse/TEZ-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648368#comment-14648368 ] 

Siddharth Seth edited comment on TEZ-2633 at 7/30/15 9:50 PM:
--------------------------------------------------------------

On the identifiers.
Moving either the current ones or a new set of identifiers is required by various plugins for efficiency. We've discussed this offline - APIs looking like onEvent(String vertexName, int vertexNumber, int taskId, int taskAttemptNumer) are fairly ugly compared to onEvent(TaskAttemptId taskAttemptId).
Also users are likely to define their own structure with this information (vertexName, vertexNumber, taskId, etc...) - which would be identical to what can be defined by Tez.
This is fairly big problem in TEZ-2003 - which currently makes use of internal IDs for simplicity and avoiding re-implementing similar holder classes.
+1 for making a change along these lines. Ideally in a separate jira - but it seems to be integrated into this patch already.
This should allow a lot of APIs to become simpler in other components, and potentially even change the serialization of information to tasks.

Agree with it being an interface in the tez-api module.
- Rather than wrapping the internal identifier implementations. They could implement the interface - which keeps the number of instances to a minimum. Entities like dagName, vertexName would have to be added into these though. 
- TezDagId, TezVertexId etc have logic to keep the number of instances to a minimum in the AM. Similar VertexIds will only be created once - likewise for all the other IDs. The same logic will likely be needed here.
- DagIdentifier could use an attempt number.

API changes on VertexManagerPlugin look good to me. Can we drop the "getCausalTaskAttemptIdentifier" from ScheduleTaskRequest and re-introduce it in the patches which are actually making the relevant changes ?

Haven't looked at the rest in detail. I can look if noone else is reviewing the patch.




was (Author: sseth):
On the identifiers.
Moving either the current ones or a new set of identifiers is required by various plugins for efficiency. We've discussed this offline - APIs looking like onEvent(String vertexName, int vertexNumber, int taskId, int taskAttemptNumer) are fairly ugly compared to onEvent(TaskAttemptId taskAttemptId).
+1 for making a change along these lines. Ideally in a separate jira - but it seems to be integrated into this patch already.
This should allow a lot of APIs to become simpler in other components, and potentially even change the serialization of information to tasks.

Agree with it being an interface in the tez-api module.
- Rather than wrapping the internal identifier implementations. They could implement the interface - which keeps the number of instances to a minimum. Entities like dagName, vertexName would have to be added into these though. 
- TezDagId, TezVertexId etc have logic to keep the number of instances to a minimum in the AM. Similar VertexIds will only be created once - likewise for all the other IDs. The same logic will likely be needed here.
- DagIdentifier could use an attempt number.

API changes on VertexManagerPlugin look good to me. Can we drop the "getCausalTaskAttemptIdentifier" from ScheduleTaskRequest and re-introduce it in the patches which are actually making the relevant changes ?

Haven't looked at the rest in detail. I can look if noone else is reviewing the patch.



> Allow VertexManagerPlugins to receive and report based on attempts instead of tasks
> -----------------------------------------------------------------------------------
>
>                 Key: TEZ-2633
>                 URL: https://issues.apache.org/jira/browse/TEZ-2633
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-2633.1.patch
>
>
> If the same event is sent from an attempt and its retry then there is no way to differentiate between them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)