You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@quickstep.apache.org by "Zuyu Zhang (JIRA)" <ji...@apache.org> on 2017/06/13 21:39:00 UTC

[jira] [Created] (QUICKSTEP-93) Better Scheduling Policy For Broadcast Hash Join in Distributed Version

Zuyu Zhang created QUICKSTEP-93:
-----------------------------------

             Summary: Better Scheduling Policy For Broadcast Hash Join in Distributed Version
                 Key: QUICKSTEP-93
                 URL: https://issues.apache.org/jira/browse/QUICKSTEP-93
             Project: Apache Quickstep
          Issue Type: Improvement
          Components: Distributed Query Execution
            Reporter: Zuyu Zhang
            Priority: Minor


In broadcast hash join, the build side has one partition, while the probe has multiple (more than one), so the build side should broadcast to each and every partition of the probe for the join.

In the distributed version, the default work order scheduling policy for the first BuildHashWorkOrder is data-locality based, which means it is possible that this parallel join that would run on multiple nodes downgrades to one node.

Alternatively, for a join in a broadcast manner, we could schedule each BuildHashWorkOrder for a partition of the probe to one node in a round-robin fashion. But unfortunately, we do not have the data locality info for the probe side when we schedule the build side work orders.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)