You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2015/02/14 03:21:11 UTC

[jira] [Commented] (TEZ-2103) Implement a Partial completion VertexManagerPlugin

    [ https://issues.apache.org/jira/browse/TEZ-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321154#comment-14321154 ] 

Bikas Saha commented on TEZ-2103:
---------------------------------

The vertex manager plugin is a user defined artifact and as such does not have to be present in the Tez project. Is it the case that limit computation is portable across projects like Hive and Pig and thus can share a common home in Tez?

> Implement a Partial completion VertexManagerPlugin
> --------------------------------------------------
>
>                 Key: TEZ-2103
>                 URL: https://issues.apache.org/jira/browse/TEZ-2103
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Gopal V
>
> Currently, there is no sibling communication between tasks - this implies that a task can be completed by the first vertex in a wave of tasks, but the entire wave of tasks has to complete before success can be reported.
> This occurs in limit + filter query patterns common between the data access engines.
> {code}
> select * from data where x > 1 limit 10;
> {code}
> will run through a full-table scan worth of tasks to generate 10 rows per task, to aggregate it to produce the final 10 row result.
> The VertexManager receives counters/events early enough to short-circuit the rest of the vertex tasks, to prevent the remainder of tasks from getting scheduled when the limit condition has been satisfied by an initial sub-set of the tasks.
> This is a specialization of the VertexManagerPlugin for this common case scheduling pattern.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)