You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@quickstep.apache.org by dylanpbacon <gi...@git.apache.org> on 2017/09/21 21:40:17 UTC

[GitHub] incubator-quickstep pull request #300: Hash-Join-Fuse: Feature added and tes...

GitHub user dylanpbacon opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/300

    Hash-Join-Fuse: Feature added and tests modified.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dylanpbacon/incubator-quickstep Hash-Join-Fuse

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/300.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #300
    
----
commit fac8b97aa198580a86cca46fd53b8a75b32dbfd3
Author: Dylan Bacon <dy...@gmail.com>
Date:   2017-09-20T19:09:32Z

    Hash-Join-Fuse: Feature added and tests modified.

----


---

[GitHub] incubator-quickstep pull request #300: QUICKSTEP-106: Hash-Join-Fuse: Featur...

Posted by dylanpbacon <gi...@git.apache.org>.
Github user dylanpbacon commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/300#discussion_r140610972
  
    --- Diff: query_optimizer/strategy/Join.cpp ---
    @@ -371,6 +373,7 @@ void Join::addHashJoin(const logical::ProjectPtr &logical_project,
                               left_join_attributes,
                               right_join_attributes,
                               residual_predicate,
    +                          build_predicate,
    --- End diff --
    
    Agreed, less superfluous that way.


---

[GitHub] incubator-quickstep issue #300: QUICKSTEP-106: Hash-Join-Fuse: Feature added...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/300
  
    We need to fix the performance bugs before merging.
    
    |  **TPC-H** | **master** | **this PR** |
    |  ------ | ------: | ------: |
    |  01 | 5,421 | 5,039 |
    |  02 | 402 | 3,968 |
    |  03 | 1,229 | 3,506 |
    |  04 | 2,472 | 1,242 |
    |  05 | 2,807 | 3,731 |
    |  06 | 399 | 394 |
    |  07 | 1,754 | 3,432 |
    |  08 | 906 | 9,961 |
    |  09 | 6,766 | TO |


---

[GitHub] incubator-quickstep issue #300: QUICKSTEP-106: Hash-Join-Fuse: Feature added...

Posted by dylanpbacon <gi...@git.apache.org>.
Github user dylanpbacon commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/300
  
    I believe all bugs have been fixed except the LIPFilter applying to modified HashJoins. Jianqiao will be taking care of that I think. Below are the updated benchmark values.
    
    
    TPC-H | Master | PR | Change
    -- | -- | -- | --
    1 | 5159.529375 | 5210.64075 | 0.9901909616
    2 | 360.623375 | 2624.057875 | 0.1374296575
    3 | 1191.425875 | 1179.699875 | 1.009939816
    4 | 2468.887125 | 2231.8005 | 1.106231101
    5 | 3577.305125 | 3599.183125 | 0.9939213985
    6 | 382.312125 | 379.8185 | 1.006565307
    7 | 1714.8465 | 1770.731 | 0.9684398703
    8 | 874.823375 | 879.3375 | 0.9948664478
    9 | 4803.38075 | 5012.880125 | 0.9582077828
    10 | 3110.523875 | 3110.895625 | 0.9998805007
    11 | 255.26625 | 241.432875 | 1.057296982
    12 | 1419.048375 | 1382.390375 | 1.026517835
    13 | 2947.814 | 2933.04975 | 1.005033754
    14 | 742.84975 | 733.042625 | 1.013378656
    15 | 426.758625 | 429.820375 | 0.9928766755
    16 | 1949.653375 | 2026.087375 | 0.9622750722
    17 | 1243.635125 | 992.756375 | 1.252709281
    18 | 3157.806875 | 3153.086875 | 1.001496946
    19 | 342.213875 | 574.703 | 0.5954621344
    20 | 954.45075 | 943.952875 | 1.011121185
    21 | 8464.721875 | 7656.392875 | 1.105575695
    22 | 711.613125 | 727.60525 | 0.9780208774
      |   |   |  
    Total | 46259.4895 | 47793.3655 | 0.9679060894
    



---

[GitHub] incubator-quickstep pull request #300: QUICKSTEP-106: Hash-Join-Fuse: Featur...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-quickstep/pull/300


---

[GitHub] incubator-quickstep issue #300: QUICKSTEP-106: Hash-Join-Fuse: Feature added...

Posted by dylanpbacon <gi...@git.apache.org>.
Github user dylanpbacon commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/300
  
    Rebased this with master, this branch is now up to date. LIP Filters still need to be fixed to work properly with this feature.


---

[GitHub] incubator-quickstep pull request #300: QUICKSTEP-106: Hash-Join-Fuse: Featur...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/300#discussion_r186265132
  
    --- Diff: query_optimizer/ExecutionGenerator.cpp ---
    @@ -966,7 +975,9 @@ void ExecutionGenerator::convertHashJoin(const P::HashJoinPtr &physical_plan) {
       const CatalogRelation *probe_relation = probe_relation_info->relation;
     
       // FIXME(quickstep-team): Add support for self-join.
    -  if (build_relation == probe_relation) {
    +  // We check to see if the build_predicate is null as certain queries that
    +  // support hash-select fuse will result in the first check being true.
    +  if (build_relation == probe_relation && physical_plan->build_predicate() == nullptr) {
    --- End diff --
    
    This change introduces a bug, because the execution engine uses the relation id to determine the build or the probe side. In the self-join case, it thus always goes to the if-case (mostly the build side), even we actually reference the probe side with the same relation id.
    
    We do not need any further action for this, because #347 will introduce the self-join.


---

[GitHub] incubator-quickstep issue #300: QUICKSTEP-106: Hash-Join-Fuse: Feature added...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/300
  
    Hi @jianqiao Could you explain more regarding the suspect reason of the slowdown in the general case (i.e., w/o the threshold)? Thanks!


---

[GitHub] incubator-quickstep pull request #300: QUICKSTEP-106: Hash-Join-Fuse: Featur...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/300#discussion_r140575077
  
    --- Diff: query_optimizer/rules/FuseHashSelect.cpp ---
    @@ -0,0 +1,91 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#include "query_optimizer/rules/FuseHashSelect.hpp"
    +
    +#include <algorithm>
    +#include <cstddef>
    +#include <unordered_set>
    +#include <vector>
    +
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/ExpressionUtil.hpp"
    +#include "query_optimizer/expressions/PatternMatcher.hpp"
    +#include "query_optimizer/expressions/Predicate.hpp"
    +#include "query_optimizer/physical/HashJoin.hpp"
    +#include "query_optimizer/physical/PatternMatcher.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/physical/PhysicalType.hpp"
    +#include "query_optimizer/physical/Selection.hpp"
    +
    +#include "glog/logging.h"
    --- End diff --
    
    Do we actually use sth from all these headers above? If no, please remove the unused headers.


---

[GitHub] incubator-quickstep issue #300: QUICKSTEP-106: Hash-Join-Fuse: Feature added...

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/300
  
    @dylanpbacon Hi Dylan, I updated the LIP related stuff and put it in this [branch](https://github.com/apache/incubator-quickstep/tree/Hash-Join-Fuse), you may just fetch the changes to reset your repo's `Hash-Join-Fuse` branch, then `git push -f`, then we can merge this PR.
    
    **Note:**
    The optimization is enabled by default, and I added an extra gflag `fuse-hash-select-threshold` that is set to one million (`1000000u`) by default. A fusion transformation is applied only when the estimated cardinality of the build-side selection is _greater_ than the threshold.
    
    Overall speaking, the fuse-hash-select optimization is especially beneficial when _the build-side selection has large output cardinality_ (e.g. TPC-H Q21), and the benefits come from two aspects:
    (1) smaller memory footprint,
    (2) avoiding materialization of the selection's output.
    
    However, due to some issues in current implementation of HashJoinOperator (and buffer manager), the fusion may slow down some queries (e.g. TPC-H Q02, 400ms -> 700ms with LIP fixed). The `fuse-hash-select-threshold` prevents those situations.



---

[GitHub] incubator-quickstep pull request #300: QUICKSTEP-106: Hash-Join-Fuse: Featur...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/300#discussion_r140575606
  
    --- Diff: query_optimizer/rules/FuseHashSelect.hpp ---
    @@ -0,0 +1,67 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_FUSE_HASH_SELECT_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_RULES_FUSE_HASH_SELECT_HPP_
    +
    +#include <cstdint>
    +#include <memory>
    +#include <string>
    +
    +
    +#include "query_optimizer/physical/HashJoin.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/rules/TopDownRule.hpp"
    +#include "utility/Macros.hpp"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +/** \addtogroup OptimizerRules
    + *  @{
    + */
    +
    +/**
    + * @brief Rule that applies to a physical plan to fuse a select node in the 
    + *        build predicate of a hash join to the join.
    + */
    +class FuseHashSelect : public TopDownRule<physical::Physical> {
    + public:
    +    /**
    +     * @brief Constructor.
    +     */
    +    FuseHashSelect() {}
    +
    +    ~FuseHashSelect() override {}
    +
    +    std::string getName() const override {
    +      return "FuseHashSelect";
    +    }
    +
    + protected:
    +    physical::PhysicalPtr applyToNode(const physical::PhysicalPtr &node) override;
    +
    + private:
    +    DISALLOW_COPY_AND_ASSIGN(FuseHashSelect);
    +};
    +
    +}  // namespace optimizer
    +}  // namespace quickstep
    +
    +#endif /* QUICKSTEP_QUERY_OPTIMIZER_RULES_FUSE_HASH_SELECT_HPP_ */
    --- End diff --
    
    The new style is `#endif  // QUICKSTEP_QUERY_OPTIMIZER_RULES_FUSE_HASH_SELECT_HPP_`.


---

[GitHub] incubator-quickstep issue #300: QUICKSTEP-106: Hash-Join-Fuse: Feature added...

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/300
  
    @zuyu Currently `HashJoinOperator` performance is sensitive to the _number of build blocks per probe block_ due to the concurrency bottleneck within LRU policy enforcer.
    
    Consider the situation that the build side relation has `N` blocks and the number of blocks decreases to `M` after applying the predicate, where `N` is very large but `M` is small. Then materializing the filtered build-side relation incurs only a small overhead, but it dramatically reduces the _number of build blocks per probe block_.
    



---

[GitHub] incubator-quickstep issue #300: QUICKSTEP-106: Hash-Join-Fuse: Feature added...

Posted by dylanpbacon <gi...@git.apache.org>.
Github user dylanpbacon commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/300
  
    This PR is now consistent with the changes @jianqiao made. It is ready to be merged, I believe. I do not have rights for this yet I think.


---

[GitHub] incubator-quickstep pull request #300: QUICKSTEP-106: Hash-Join-Fuse: Featur...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/300#discussion_r140570548
  
    --- Diff: query_optimizer/PhysicalGenerator.cpp ---
    @@ -175,6 +176,8 @@ P::PhysicalPtr PhysicalGenerator::optimizePlan() {
         rules.push_back(std::make_unique<Partition>(optimizer_context_));
       }
     
    +  rules.emplace_back(new FuseHashSelect());
    --- End diff --
    
    You may want to add a `gflags` with a default `true` value to switch between using this rule or not, w/o rebuilding the binary.


---

[GitHub] incubator-quickstep issue #300: QUICKSTEP-106: Hash-Join-Fuse: Feature added...

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/300
  
    Merging.


---

[GitHub] incubator-quickstep pull request #300: QUICKSTEP-106: Hash-Join-Fuse: Featur...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/300#discussion_r140569985
  
    --- Diff: query_optimizer/ExecutionGenerator.cpp ---
    @@ -939,6 +939,15 @@ void ExecutionGenerator::convertHashJoin(const P::HashJoinPtr &physical_plan) {
         query_context_proto_->add_predicates()->CopyFrom(residual_predicate->getProto());
       }
     
    +  // Convert the build predicate proto.
    +  QueryContext::predicate_id build_predicate_index = QueryContext::kInvalidPredicateId;
    +  if (physical_plan->build_predicate()) {
    +    build_predicate_index = query_context_proto_->predicates_size();
    +
    +    unique_ptr<const Predicate> build_predicate(convertPredicate(physical_plan->build_predicate()));
    +    query_context_proto_->add_predicates()->CopyFrom(build_predicate->getProto());
    --- End diff --
    
    Change to `MergeFrom` to avoid an unnecessary reset function call.


---

[GitHub] incubator-quickstep pull request #300: QUICKSTEP-106: Hash-Join-Fuse: Featur...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/300#discussion_r140575177
  
    --- Diff: query_optimizer/rules/FuseHashSelect.hpp ---
    @@ -0,0 +1,67 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_FUSE_HASH_SELECT_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_RULES_FUSE_HASH_SELECT_HPP_
    +
    +#include <cstdint>
    +#include <memory>
    +#include <string>
    +
    +
    --- End diff --
    
    Remove this duplicated empty line, and unused headers, if any.


---

[GitHub] incubator-quickstep pull request #300: QUICKSTEP-106: Hash-Join-Fuse: Featur...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/300#discussion_r140575916
  
    --- Diff: query_optimizer/strategy/Join.cpp ---
    @@ -371,6 +373,7 @@ void Join::addHashJoin(const logical::ProjectPtr &logical_project,
                               left_join_attributes,
                               right_join_attributes,
                               residual_predicate,
    +                          build_predicate,
    --- End diff --
    
    Optionally, I think `E::PredicatePtr()  /* build_predicate */,` is better.


---