You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tajo.apache.org by "Jihoon Son (JIRA)" <ji...@apache.org> on 2013/10/27 12:33:30 UTC

[jira] [Commented] (TAJO-266) Extend ExecutionBlock and Task to support multiple outputs

    [ https://issues.apache.org/jira/browse/TAJO-266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806313#comment-13806313 ] 

Jihoon Son commented on TAJO-266:
---------------------------------

For this issue, I designed a new class called ExecutionPlan.
An ExecutionPlan is a DAG which consists of LogicalNodes and their connections. Each connection represents a data flow between LogicalNodes.
Each ExecutionBlock contains an ExecutionPlan instead of a LogicalPlan.
When a master executes an ExecutionBlock, it sends an ExecutionPlan of the ExecutionBlock to tasks.
After that, each task generates a PhysicalPlan from the given ExecutionPlan.
Here, I added two PhysicalNodes, called PhysicalRootExec and MultiOutExec, to support multiple outputs while preserving the pipelined query execution structure.
PhysicalRootExec is just used to represent the root of the physical plan. 
MultiOutExec receives an integer n as an argument of the constructor.
When a next() is called, MultiOutExec returns the same tuple n times.

I attached figures to help you better understand. These figures show a comparison between the current master plan and a master plan optimized by the YSmart algorithm (see TAJO-161).

While this structure looks little complicated, it can support various master plan optimization such as TAJO-161.
So, based on this structure, I think that we can develop a new master plan optimizer and optimization rules which can significantly improve the query processing performance.

Please give any advice.
Thanks.

> Extend ExecutionBlock and Task to support multiple outputs
> ----------------------------------------------------------
>
>                 Key: TAJO-266
>                 URL: https://issues.apache.org/jira/browse/TAJO-266
>             Project: Tajo
>          Issue Type: Task
>          Components: distributed query plan, worker
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>
> In the current Tajo, every task has the only one output.
> However, supporting multiple outputs per task very useful for the distributed plan optimization.



--
This message was sent by Atlassian JIRA
(v6.1#6144)