You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Jihoon Son (JIRA)" <ji...@apache.org> on 2015/05/08 08:28:00 UTC

[jira] [Updated] (TAJO-1553) Improve broadcast join planning

     [ https://issues.apache.org/jira/browse/TAJO-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jihoon Son updated TAJO-1553:
-----------------------------
    Description: 
The global engine generates a logical plan, and then marks some parts of the plan as broadcast plan which means that they and their input will be broadcasted to all workers. 

Currently, broadcast parts are identified according to some rigid and hard-coded rules. This will limit the broadcast opportunities in many cases.
So, in this issue, I propose refactoring the broadcast planner to be more general.

Broadcast parts can be identified recursively.
* A leaf node will be broadcasted if its input size does not exceed the pre-defined threshold.
* An intermediate node will be broadcasted if it has at least one broadcast child.
* For outer joins, row-preserving tables must not be broadcasted to avoid input data duplication.

  was:
The global engine generates a logical plan, and then marks some parts of the plan as broadcast plan which means that they and their input will be broadcasted to all workers. 

Currently, broadcast parts are identified according to some rigid and hard-coded rules. This will limit the broadcast opportunities in many cases.
So, in this issue, I propose refactoring the broadcast planner to be more general.

Broadcast parts can be identified recursively.
* A leaf node will be broadcasted if its input size does not exceed the pre-defined threshold.
* An intermediate node will be broadcasted if it has at least one broadcast child.


> Improve broadcast join planning
> -------------------------------
>
>                 Key: TAJO-1553
>                 URL: https://issues.apache.org/jira/browse/TAJO-1553
>             Project: Tajo
>          Issue Type: Improvement
>          Components: distributed query plan, planner/optimizer
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>             Fix For: 0.11.0
>
>
> The global engine generates a logical plan, and then marks some parts of the plan as broadcast plan which means that they and their input will be broadcasted to all workers. 
> Currently, broadcast parts are identified according to some rigid and hard-coded rules. This will limit the broadcast opportunities in many cases.
> So, in this issue, I propose refactoring the broadcast planner to be more general.
> Broadcast parts can be identified recursively.
> * A leaf node will be broadcasted if its input size does not exceed the pre-defined threshold.
> * An intermediate node will be broadcasted if it has at least one broadcast child.
> * For outer joins, row-preserving tables must not be broadcasted to avoid input data duplication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)