You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tajo.apache.org by "Jihoon Son (JIRA)" <ji...@apache.org> on 2013/10/27 12:59:30 UTC
[jira] [Comment Edited] (TAJO-266) Extend ExecutionBlock and Task to support multiple outputs

    [ https://issues.apache.org/jira/browse/TAJO-266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13806313#comment-13806313 ] 

Jihoon Son edited comment on TAJO-266 at 10/27/13 11:57 AM:
------------------------------------------------------------

For this issue, I designed a new class called ExecutionPlan.
An ExecutionPlan is a DAG which consists of LogicalNodes and their connections. Each connection represents a data flow between LogicalNodes.
Each ExecutionBlock contains an ExecutionPlan instead of a LogicalPlan.
When a master executes an ExecutionBlock, it sends an ExecutionPlan of the ExecutionBlock to tasks.
After that, each task generates a PhysicalPlan from the given ExecutionPlan.
Here, I added two PhysicalNodes, called PhysicalRootExec and MultiOutExec, to support multiple outputs while preserving the pipelined query execution structure.
PhysicalRootExec is just used to represent the root of the physical plan. 
MultiOutExec receives an integer n as an argument of the constructor.
It receives a tuple from its child and returns it n times.

I attached figures to help you better understand. These figures show a comparison between the current master plan and a master plan optimized by the YSmart algorithm (see TAJO-161).

While this structure looks little complicated, it can support various master plan optimization such as TAJO-161.
So, based on this structure, I think that we can develop a new master plan optimizer and optimization rules which can significantly improve the query processing performance.

Please give any advice.
Thanks.


was (Author: jihoonson):
For this issue, I designed a new class called ExecutionPlan.
An ExecutionPlan is a DAG which consists of LogicalNodes and their connections. Each connection represents a data flow between LogicalNodes.
Each ExecutionBlock contains an ExecutionPlan instead of a LogicalPlan.
When a master executes an ExecutionBlock, it sends an ExecutionPlan of the ExecutionBlock to tasks.
After that, each task generates a PhysicalPlan from the given ExecutionPlan.
Here, I added two PhysicalNodes, called PhysicalRootExec and MultiOutExec, to support multiple outputs while preserving the pipelined query execution structure.
PhysicalRootExec is just used to represent the root of the physical plan. 
MultiOutExec receives an integer n as an argument of the constructor.
When a next() is called, MultiOutExec returns the same tuple n times.

I attached figures to help you better understand. These figures show a comparison between the current master plan and a master plan optimized by the YSmart algorithm (see TAJO-161).

While this structure looks little complicated, it can support various master plan optimization such as TAJO-161.
So, based on this structure, I think that we can develop a new master plan optimizer and optimization rules which can significantly improve the query processing performance.

Please give any advice.
Thanks.

> Extend ExecutionBlock and Task to support multiple outputs
> ----------------------------------------------------------
>
>                 Key: TAJO-266
>                 URL: https://issues.apache.org/jira/browse/TAJO-266
>             Project: Tajo
>          Issue Type: Task
>          Components: distributed query plan, worker
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>         Attachments: convert execution plan.jpg, current master plan.jpg, optimized master plan.jpg
>
>
> In the current Tajo, every task has the only one output.
> However, supporting multiple outputs per task very useful for the distributed plan optimization.



--
This message was sent by Atlassian JIRA
(v6.1#6144)