You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@quickstep.apache.org by zuyu <gi...@git.apache.org> on 2017/06/14 04:45:07 UTC

[GitHub] incubator-quickstep pull request #257: Added Partition Rule for HashJoin.

GitHub user zuyu opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/257

    Added Partition Rule for HashJoin.

    This PR substitutes #241 to add support for partitioned hash join. The rule adds repartitions if needed as follows:
    
    ```
          /*
           * Whether left or right side needs to repartition.
           *
           * --------------------------------------------------------------------------
           * | Right \ Left     | No Partition  | Hash Partition h' | Other Partition |
           * --------------------------------------------------------------------------
           * | No Partition     | false \ false |  false \ false    |  true \ true    |
           * --------------------------------------------------------------------------
           * | Hash Partition h | false \ true  | false* \ false    | false \ true    |
           * --------------------------------------------------------------------------
           * | Other Partition  |  true \ true  |   true \ false    |  true \ true    |
           * --------------------------------------------------------------------------
           *
           * Hash Partition h / h': the paritition attributes are as the same as the join attributes.
           * *: If h and h' has different number of partitions, the left side needs to repartition.
           */
    ```
    
    Assigned to @jianqiao.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zuyu/incubator-quickstep partition-rule-for-hash-join

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/257.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #257
    
----
commit 46f514dcd5d22ab3cecaa8dcb2446ebdd8dba6e7
Author: Zuyu Zhang <zu...@apache.org>
Date:   2017-06-14T02:50:41Z

    Added Partition Rule for HashJoin.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #257: Added Partition Rule for HashJoin.

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-quickstep/pull/257


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #257: Added Partition Rule for HashJoin.

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/257#discussion_r122334121
  
    --- Diff: query_optimizer/PhysicalGenerator.cpp ---
    @@ -176,6 +177,8 @@ P::PhysicalPtr PhysicalGenerator::optimizePlan() {
         rules.emplace_back(new AttachLIPFilters());
       }
     
    +  rules.push_back(std::make_unique<Partition>(optimizer_context_));
    --- End diff --
    
    In the single-node version on a `NUMA` machine, we needs this partition rule. Sure, I am adding a flag to switch this rule.
    
    Regarding the first question, if I put the partition rule before `LIP` rules, how could I process `FilterJoin` with partitioned inputs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #257: Added Partition Rule for HashJoin.

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/257#discussion_r122326409
  
    --- Diff: query_optimizer/PhysicalGenerator.cpp ---
    @@ -176,6 +177,8 @@ P::PhysicalPtr PhysicalGenerator::optimizePlan() {
         rules.emplace_back(new AttachLIPFilters());
       }
     
    +  rules.push_back(std::make_unique<Partition>(optimizer_context_));
    --- End diff --
    
    The other question is that -- is this rule always needed to be applied (e.g. even in single-node mode)? If not, we may have a flag or `#ifdef` to conditionally apply this rule.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #257: Added Partition Rule for HashJoin.

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/257
  
    Good refactoring. LGTM! Except one place in `PhysicalGenerator.cpp`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #257: Added Partition Rule for HashJoin.

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/257#discussion_r122325834
  
    --- Diff: query_optimizer/PhysicalGenerator.cpp ---
    @@ -176,6 +177,8 @@ P::PhysicalPtr PhysicalGenerator::optimizePlan() {
         rules.emplace_back(new AttachLIPFilters());
       }
     
    +  rules.push_back(std::make_unique<Partition>(optimizer_context_));
    --- End diff --
    
    Is it okay to move this rule before the two LIP-related rules? Currently the LIP information may be invalidated (and lead to incorrect results) if tree transformation happens after the two LIP passes.
    
    Meanwhile, better add some comments describing what additional information would be added into the physical nodes after this pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #257: Added Partition Rule for HashJoin.

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/257#discussion_r122482057
  
    --- Diff: query_optimizer/PhysicalGenerator.cpp ---
    @@ -176,6 +177,8 @@ P::PhysicalPtr PhysicalGenerator::optimizePlan() {
         rules.emplace_back(new AttachLIPFilters());
       }
     
    +  rules.push_back(std::make_unique<Partition>(optimizer_context_));
    --- End diff --
    
    I have revisited the partition rule, and think it is ok for now to put it between two LIP-related rules, as for `HashJoin` we only have the following changes to a physical plan
     1. add additional `Selections` for base tables or subquery relations, and a few non-LIP related operators like `sort` and `union all`.
     1. change output `PartitionSchemeHeader` of a Physical Plan node.
    
    But for partitioned aggregation in a follow-up PR, we may add another aggregate operator as a finalize step. On the other hand, the original aggregation maybe substituted by a newly created aggregation with the same input node, so I put the partition rule before `AttachLIPFilters`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #257: Added Partition Rule for HashJoin.

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/257
  
    Merging after discussing with @jianqiao.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---