You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2014/09/04 20:17:51 UTC

[jira] [Commented] (TEZ-1447) Provide a mechanism for InputInitializers to know about interesting Vertex state changes

    [ https://issues.apache.org/jira/browse/TEZ-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121702#comment-14121702 ] 

Hitesh Shah commented on TEZ-1447:
----------------------------------

IMHO, Option 1 - register(VertexName) might create problems in the long term. Today, the state changes one can listen to are very simplistic i.e. vertex status changes or parallelism updated. What if we add support for task or task attempt launched or completion notifications i.e. notify me whenever any task of vertex X finishes? In this scenario, option 1 will bombard all listeners with too many updates. Restricting the set of notifications will allow for a reduced load on the AM.



> Provide a mechanism for InputInitializers to know about interesting Vertex state changes
> ----------------------------------------------------------------------------------------
>
>                 Key: TEZ-1447
>                 URL: https://issues.apache.org/jira/browse/TEZ-1447
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Gunther Hagleitner
>            Assignee: Siddharth Seth
>            Priority: Blocker
>         Attachments: TEZ-1447.1.wip.txt, TEZ-1447.2.txt
>
>
> I'm trying to do dynamic partition pruning through input initializer events in Hive. That means that the initializer of a table scan vertex has to receive events from all tasks in another vertex (which contain the pruning info) before generating tasks to run.
> The problem with the current API I ran into:
> getNumTasks: I'm currently using a busy loop to wait for the num tasks for a vertex to be decided (-1 -> x). There's no way around it, because it's the only way to find out what number of events to expect (0 is a valid number of tasks - so I can't wait for the first to complete).
> With auto-reducer parallelism I have to employ another busy loop. Because I might be initially expecting 10 events, which later get's knocked down to 5. Since there's no event associated with this, I have to periodically check whether I have enough events.
> Versioning: Events have a version number, but I don't know which task they are coming from. Thus I can't de-dup events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)