You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2014/09/11 00:28:35 UTC

[jira] [Comment Edited] (TEZ-1539) Allow a FIRE_ONCE_ON_SUCCESS model for events generated by user code

    [ https://issues.apache.org/jira/browse/TEZ-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129245#comment-14129245 ] 

Bikas Saha edited comment on TEZ-1539 at 9/10/14 10:28 PM:
-----------------------------------------------------------

Setting aside the discussion on the change in semantics and looking at the patch itself.

Someone familiar with recovery needs to look at the recovery code. But we should follow this event flow to make sure its not DataMovementEvent specific. If so, then rename the event please
{code}           VertexDataMovementEventsGeneratedEvent historyEvent =
               new VertexDataMovementEventsGeneratedEvent(vertex.vertexId,
-                  dataMovementEvents);
+                  recoveryEvents);{code}

In route event the code is tracking the source vertex name but then dropping the source vertex name when storing the event because the key in the map is taskId. If there are multiple source vertices then the taskIds would over-write each other. OR are we only going to support a single source vertex? If so, then that should be documented.

I feel the code might be simpler if it moved to InputInitializerManager from VertexImpl. At least it would not clutter the route event transitions and other transitions. IIM could invoke the sendIIEvent() method before initialization and do the buffering inside it. In the patch VertexImpl has to call send after each place that it calls RootInput.initializer(). The VertexImpl would need to call IIM->notifyTaskSuccess() where in the patch its calling VertexImpl.sendIIEvent(). This code seems local to IIM and kind off out of place in the VertexImpl.


was (Author: bikassaha):
Setting aside the discussion on the change in semantics and looking at the patch itself.

Someone familiar with recovery needs to look at the recovery code. But we should follow this event flow to make sure its not DataMovementEvent specific. If so, then rename the event please
{code}           VertexDataMovementEventsGeneratedEvent historyEvent =
               new VertexDataMovementEventsGeneratedEvent(vertex.vertexId,
-                  dataMovementEvents);
+                  recoveryEvents);{code}

In route event the code is tracking the source vertex name but then dropping the source vertex name when storing the event because the key i the map is taskId. If there are multiple source vertices then the taskIds would over-write each other. OR are we only going to support a single source vertex? If so, then that should be documented.

> Allow a FIRE_ONCE_ON_SUCCESS model for events generated by user code
> --------------------------------------------------------------------
>
>                 Key: TEZ-1539
>                 URL: https://issues.apache.org/jira/browse/TEZ-1539
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: TEZ-1539.1.wip.txt, TEZ-1539.2.txt
>
>
> Specifically for InputInitalizerEvents and VertexManagerEvents.
> Pasting comment from TEZ-1447
> In a majority of cases, events generated by different attempts of the same task will be identical - in which case just making use of the event generated by the first successful attempt is adequate. Doing something like this manes that users don't worry about retries, indices etc - and can just rely on receiving a set of events which are to be processed once the vertex succeeds.
> If different attempts of the same workload generate different events - processing is likely to be incorrect, since it's very possible for all data to be processed (VERTEX successful), then a failure and retry - which generates a different event. The initializer doesn't even run at this point, since it's already done it's work and is complete. Handling such scenarios, likely involves re-running the entire initializer and re-starting the vertex which processed the event from scratch. In situations like this, where data generated may be different, the best bet is for speculation to be disabled (when it's supported), and max-attempts to be set to 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)