You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@quickstep.apache.org by jianqiao <gi...@git.apache.org> on 2017/02/02 20:25:14 UTC

[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

GitHub user jianqiao opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/177

    Reduce the number of group-by attributes by pulling tables up aggregation.

    This PR implements an optimization (physical plan transformation) that pulls a table up an aggregation if many of that table's attributes serve as group-by attributes in the aggregation. We do the optimization because it is relatively slow to aggregate with a large set of group-by attributes, as well as to avoid copying the table's many attributes all the way up a chain of operators.
    
    For example, let `R` be a relation with `PRIMARY KEY x` and attributes `y`, `z`. Let `S` be a relation with `FOREIGN KEY u` refering to `R.x` and attribute `v`. Then the optimization rule will transform the physical plan:
    
    ```
    Aggregate(
      [input relation]: HashJoin(
                          [probe relation]: S
                          [build relation]: R
                          [join expression]: S.u = R.x
                          [project attributes]: v, x, y, z
                        )
      [aggregate expression]: SUM(v) AS sum_v
      [group-by attributes]: x, y, z
    )
    ``` 
    into:
    ```
    HashJoin(
      [probe relation]: Aggregate(
                          [input relation]: S
                          [aggregate expression]: SUM(v) AS sum_v
                          [group-by attribute]: u
                        ) AS T
      [build relation]: R
      [join expression]: T.u = R.x
      [project attributes]: sum_v, x, y, z
    )
    ```
    
    This optimization improves the performance of TPC-H Q10 from ~13s to ~5.7s, with scale factor 100 on a cloudlab machine.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-quickstep reduce-group-by-attrs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/177.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #177
    
----
commit 83d1b592445eb27ad34d56ded14618b019bdd11d
Author: Jianqiao Zhu <ji...@cs.wisc.edu>
Date:   2017-01-30T00:36:14Z

    Reduce the number of group-by attributes by pulling tables up aggregations

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/177#discussion_r99277242
  
    --- Diff: query_optimizer/rules/ReduceGroupByAttributes.cpp ---
    @@ -0,0 +1,217 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#include "query_optimizer/rules/ReduceGroupByAttributes.hpp"
    +
    +#include <algorithm>
    +#include <map>
    +#include <vector>
    +#include <unordered_set>
    +#include <utility>
    +
    +#include "catalog/CatalogRelation.hpp"
    +#include "query_optimizer/OptimizerContext.hpp"
    +#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/ExprId.hpp"
    +#include "query_optimizer/expressions/ExpressionUtil.hpp"
    +#include "query_optimizer/expressions/NamedExpression.hpp"
    +#include "query_optimizer/physical/Aggregate.hpp"
    +#include "query_optimizer/physical/HashJoin.hpp"
    +#include "query_optimizer/physical/PatternMatcher.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/physical/PhysicalType.hpp"
    +#include "query_optimizer/physical/TableReference.hpp"
    +#include "query_optimizer/physical/TopLevelPlan.hpp"
    +#include "query_optimizer/rules/PruneColumns.hpp"
    +
    +#include "gflags/gflags.h"
    +
    +#include "glog/logging.h"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +DEFINE_uint64(reduce_group_by_attributes_threshold, 3u,
    +              "The threshold for a stored relation's number of attributes in a "
    +              "group-by clause for the ReduceGroupByAttributes optimization "
    +              "rule to pull the stored relation up the aggregation");
    +
    +namespace E = ::quickstep::optimizer::expressions;
    +namespace P = ::quickstep::optimizer::physical;
    +
    +P::PhysicalPtr ReduceGroupByAttributes::apply(const P::PhysicalPtr &input) {
    +  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
    +  cost_model_.reset(new cost::StarSchemaSimpleCostModel(
    +      std::static_pointer_cast<const P::TopLevelPlan>(input)->shared_subplans()));
    +
    +  P::PhysicalPtr output = applyInternal(input);
    +  if (output != input) {
    +    output = PruneColumns().apply(output);
    +  }
    +  return output;
    +}
    +
    +P::PhysicalPtr ReduceGroupByAttributes::applyInternal(const P::PhysicalPtr &input) {
    +  std::vector<P::PhysicalPtr> new_children;
    +  for (const P::PhysicalPtr &child : input->children()) {
    +    new_children.push_back(applyInternal(child));
    +  }
    +
    +  if (new_children != input->children()) {
    +    return applyToNode(input->copyWithNewChildren(new_children));
    +  } else {
    +    return applyToNode(input);
    +  }
    +}
    +
    +P::PhysicalPtr ReduceGroupByAttributes::applyToNode(const P::PhysicalPtr &input) {
    +  P::TableReferencePtr table_reference;
    +  if (P::SomeTableReference::MatchesWithConditionalCast(input, &table_reference)) {
    +    // Collect the attributes-to-TableReference mapping info.
    +    for (const auto &attr : table_reference->attribute_list()) {
    +      source_.emplace(attr->id(), std::make_pair(table_reference, attr));
    +    }
    +    return input;
    +  }
    +
    +  P::AggregatePtr aggregate;
    +  if (!P::SomeAggregate::MatchesWithConditionalCast(input, &aggregate) ||
    +      aggregate->grouping_expressions().size() <= 1u) {
    +    return input;
    +  }
    +
    +  // Divide the group-by attributes into groups based on their source table.
    +  std::map<P::TableReferencePtr, std::vector<E::AttributeReferencePtr>> table_attributes;
    --- End diff --
    
    Could we use a faster `unordered_map` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/177#discussion_r99277016
  
    --- Diff: query_optimizer/Optimizer.cpp ---
    @@ -30,10 +30,11 @@ void Optimizer::generateQueryHandle(const ParseStatement &parse_statement,
                                         OptimizerContext *optimizer_context,
                                         QueryHandle *query_handle) {
       LogicalGenerator logical_generator(optimizer_context);
    +  PhysicalGenerator physical_generator(optimizer_context);
    --- End diff --
    
    I was trying to make query optimizer stateless regarding an input query. I'd suggest to pass `optimizer_context` in `PhysicalGenerator::generatePlan`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-quickstep/pull/177


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #177: Reduce the number of group-by attributes by ...

Posted by pateljm <gi...@git.apache.org>.
Github user pateljm commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/177
  
    @jianqiao Another neat optimization! I also like how your PRs are clean. Thank you for adding the pdf, which makes understanding this optimization super easy. 
    
    @zuyu Great comments. I'd like to suggest that given the scope of the PR, the requests you make be rolled into a future PR.
    
    I'm happy to close this PR -- so @jianqiao feel free to assign this to me.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/177#discussion_r99964669
  
    --- Diff: query_optimizer/rules/ReduceGroupByAttributes.hpp ---
    @@ -0,0 +1,143 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_REDUCE_GROUP_BY_ATTRIBUTES_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_RULES_REDUCE_GROUP_BY_ATTRIBUTES_HPP_
    +
    +#include <cstddef>
    +#include <memory>
    +#include <string>
    +#include <unordered_map>
    +#include <utility>
    +
    +#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/ExprId.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/physical/TableReference.hpp"
    +#include "query_optimizer/rules/Rule.hpp"
    +#include "utility/Macros.hpp"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +class OptimizerContext;
    +
    +/**
    + * @brief Rule that applies to a physical plan to reduce the number of group-by
    + *        attributes for Aggregate nodes (to improve performance) by pulling
    + *        joins up the aggregations.
    + *
    + * For example, let R be a relation with PRIMARY KEY x and attributes y, z. Let
    + * S be a relation with FOREIGN KEY u refering to R(x) and attribute v. Then the
    + * optimization rule will transform the physical plan:
    + *   Aggregate(
    + *     [input relation]: HashJoin(
    + *                         [probe relation]: S
    + *                         [build relation]: R
    + *                         [join expression]: S.u = R.x
    + *                         [project attributes]: v, x, y, z
    + *                       )
    + *     [aggregate expression]: SUM(v) AS sum_v
    + *     [group-by attributes]: x, y, z
    + *   )
    + *
    + * into:
    + *   HashJoin(
    + *     [probe relation]: Aggregate(
    + *                         [input relation]: S
    + *                         [aggregate expression]: SUM(v) AS sum_v
    + *                         [group-by attribute]: u
    + *                       ) AS T
    + *     [build relation]: R
    + *     [join expression]: T.u = R.x
    + *     [project attributes]: sum_v, x, y, z
    + *   )
    + */
    +class ReduceGroupByAttributes : public Rule<physical::Physical> {
    + public:
    +  /**
    +   * @brief Constructor.
    +   *
    +   * @param optimizer_context The optimizer context.
    +   */
    +  explicit ReduceGroupByAttributes(OptimizerContext *optimizer_context)
    +      : optimizer_context_(optimizer_context) {}
    +
    +  ~ReduceGroupByAttributes() override {}
    +
    +  std::string getName() const override {
    +    return "ReduceGroupByAttributes";
    +  }
    +
    +  physical::PhysicalPtr apply(const physical::PhysicalPtr &input) override;
    +
    + private:
    +  struct AttributeInfo {
    +    AttributeInfo(const expressions::AttributeReferencePtr &attribute_in,
    +                  const bool is_unique_in,
    +                  const bool is_fixed_length_in,
    +                  const std::size_t maximum_size_in)
    +        : attribute(attribute_in),
    +          is_unique(is_unique_in),
    +          is_fixed_length(is_fixed_length_in),
    +          maximum_size(maximum_size_in) {}
    +
    +    // In the situation that there are multiple attributes that can serve as the
    +    // group-by key, we define an ordering based on aggregation performance (e.g.
    +    // it is faster to do aggregation with a fix-length attribute as the group-by
    +    // key than with a variable-length attribute).
    +    inline static bool IsBetterThan(const AttributeInfo *lhs,
    +                                    const AttributeInfo *rhs) {
    +      if (lhs->is_unique != rhs->is_unique) {
    +        return lhs->is_unique;
    +      }
    +      if (lhs->is_fixed_length != rhs->is_fixed_length) {
    +        return lhs->is_fixed_length;
    +      }
    +      if (lhs->maximum_size != rhs->maximum_size) {
    +        return lhs->maximum_size < rhs->maximum_size;
    +      }
    +      return lhs->attribute->id() < rhs->attribute->id();
    +    }
    +
    +    const expressions::AttributeReferencePtr attribute;
    +    const bool is_unique;
    +    const bool is_fixed_length;
    +    const std::size_t maximum_size;
    --- End diff --
    
    Then, please rename to `maximum_byte_length`, as defined in `Type`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/177#discussion_r99865239
  
    --- Diff: query_optimizer/rules/ReduceGroupByAttributes.cpp ---
    @@ -0,0 +1,217 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#include "query_optimizer/rules/ReduceGroupByAttributes.hpp"
    +
    +#include <algorithm>
    +#include <map>
    +#include <vector>
    +#include <unordered_set>
    +#include <utility>
    +
    +#include "catalog/CatalogRelation.hpp"
    +#include "query_optimizer/OptimizerContext.hpp"
    +#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/ExprId.hpp"
    +#include "query_optimizer/expressions/ExpressionUtil.hpp"
    +#include "query_optimizer/expressions/NamedExpression.hpp"
    +#include "query_optimizer/physical/Aggregate.hpp"
    +#include "query_optimizer/physical/HashJoin.hpp"
    +#include "query_optimizer/physical/PatternMatcher.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/physical/PhysicalType.hpp"
    +#include "query_optimizer/physical/TableReference.hpp"
    +#include "query_optimizer/physical/TopLevelPlan.hpp"
    +#include "query_optimizer/rules/PruneColumns.hpp"
    +
    +#include "gflags/gflags.h"
    +
    +#include "glog/logging.h"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +DEFINE_uint64(reduce_group_by_attributes_threshold, 3u,
    +              "The threshold for a stored relation's number of attributes in a "
    +              "group-by clause for the ReduceGroupByAttributes optimization "
    +              "rule to pull the stored relation up the aggregation");
    +
    +namespace E = ::quickstep::optimizer::expressions;
    +namespace P = ::quickstep::optimizer::physical;
    +
    +P::PhysicalPtr ReduceGroupByAttributes::apply(const P::PhysicalPtr &input) {
    +  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
    +  cost_model_.reset(new cost::StarSchemaSimpleCostModel(
    +      std::static_pointer_cast<const P::TopLevelPlan>(input)->shared_subplans()));
    +
    +  P::PhysicalPtr output = applyInternal(input);
    +  if (output != input) {
    +    output = PruneColumns().apply(output);
    +  }
    +  return output;
    +}
    +
    +P::PhysicalPtr ReduceGroupByAttributes::applyInternal(const P::PhysicalPtr &input) {
    +  std::vector<P::PhysicalPtr> new_children;
    +  for (const P::PhysicalPtr &child : input->children()) {
    +    new_children.push_back(applyInternal(child));
    +  }
    +
    +  if (new_children != input->children()) {
    +    return applyToNode(input->copyWithNewChildren(new_children));
    +  } else {
    +    return applyToNode(input);
    +  }
    +}
    +
    +P::PhysicalPtr ReduceGroupByAttributes::applyToNode(const P::PhysicalPtr &input) {
    +  P::TableReferencePtr table_reference;
    +  if (P::SomeTableReference::MatchesWithConditionalCast(input, &table_reference)) {
    +    // Collect the attributes-to-TableReference mapping info.
    +    for (const auto &attr : table_reference->attribute_list()) {
    +      source_.emplace(attr->id(), std::make_pair(table_reference, attr));
    +    }
    +    return input;
    +  }
    +
    +  P::AggregatePtr aggregate;
    +  if (!P::SomeAggregate::MatchesWithConditionalCast(input, &aggregate) ||
    +      aggregate->grouping_expressions().size() <= 1u) {
    +    return input;
    +  }
    +
    +  // Divide the group-by attributes into groups based on their source table.
    +  std::map<P::TableReferencePtr, std::vector<E::AttributeReferencePtr>> table_attributes;
    --- End diff --
    
    We would expect tables attributes to contain a small number of entries (# of storage tables for the attributes in one group-by clause, typically 1 to 3). Also consider that there an iteration of all the map elements below. It might be better to use a `std::map` here. Anyway it won't make even a tiny performance difference to use either data structure here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/177#discussion_r99886212
  
    --- Diff: query_optimizer/rules/ReduceGroupByAttributes.hpp ---
    @@ -0,0 +1,143 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_REDUCE_GROUP_BY_ATTRIBUTES_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_RULES_REDUCE_GROUP_BY_ATTRIBUTES_HPP_
    +
    +#include <cstddef>
    +#include <memory>
    +#include <string>
    +#include <unordered_map>
    +#include <utility>
    +
    +#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/ExprId.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/physical/TableReference.hpp"
    +#include "query_optimizer/rules/Rule.hpp"
    +#include "utility/Macros.hpp"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +class OptimizerContext;
    +
    +/**
    + * @brief Rule that applies to a physical plan to reduce the number of group-by
    + *        attributes for Aggregate nodes (to improve performance) by pulling
    + *        joins up the aggregations.
    + *
    + * For example, let R be a relation with PRIMARY KEY x and attributes y, z. Let
    + * S be a relation with FOREIGN KEY u refering to R(x) and attribute v. Then the
    + * optimization rule will transform the physical plan:
    + *   Aggregate(
    + *     [input relation]: HashJoin(
    + *                         [probe relation]: S
    + *                         [build relation]: R
    + *                         [join expression]: S.u = R.x
    + *                         [project attributes]: v, x, y, z
    + *                       )
    + *     [aggregate expression]: SUM(v) AS sum_v
    + *     [group-by attributes]: x, y, z
    + *   )
    + *
    + * into:
    + *   HashJoin(
    + *     [probe relation]: Aggregate(
    + *                         [input relation]: S
    + *                         [aggregate expression]: SUM(v) AS sum_v
    + *                         [group-by attribute]: u
    + *                       ) AS T
    + *     [build relation]: R
    + *     [join expression]: T.u = R.x
    + *     [project attributes]: sum_v, x, y, z
    + *   )
    --- End diff --
    
    Do we still keep this somewhat misleading example, or will update it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/177#discussion_r99865183
  
    --- Diff: query_optimizer/Optimizer.cpp ---
    @@ -30,10 +30,11 @@ void Optimizer::generateQueryHandle(const ParseStatement &parse_statement,
                                         OptimizerContext *optimizer_context,
                                         QueryHandle *query_handle) {
       LogicalGenerator logical_generator(optimizer_context);
    +  PhysicalGenerator physical_generator(optimizer_context);
    --- End diff --
    
    This might be done later together with the same change to `LogicalGenerator`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/177#discussion_r99963866
  
    --- Diff: query_optimizer/rules/ReduceGroupByAttributes.hpp ---
    @@ -0,0 +1,143 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_REDUCE_GROUP_BY_ATTRIBUTES_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_RULES_REDUCE_GROUP_BY_ATTRIBUTES_HPP_
    +
    +#include <cstddef>
    +#include <memory>
    +#include <string>
    +#include <unordered_map>
    +#include <utility>
    +
    +#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/ExprId.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/physical/TableReference.hpp"
    +#include "query_optimizer/rules/Rule.hpp"
    +#include "utility/Macros.hpp"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +class OptimizerContext;
    +
    +/**
    + * @brief Rule that applies to a physical plan to reduce the number of group-by
    + *        attributes for Aggregate nodes (to improve performance) by pulling
    + *        joins up the aggregations.
    + *
    + * For example, let R be a relation with PRIMARY KEY x and attributes y, z. Let
    + * S be a relation with FOREIGN KEY u refering to R(x) and attribute v. Then the
    + * optimization rule will transform the physical plan:
    + *   Aggregate(
    + *     [input relation]: HashJoin(
    + *                         [probe relation]: S
    + *                         [build relation]: R
    + *                         [join expression]: S.u = R.x
    + *                         [project attributes]: v, x, y, z
    + *                       )
    + *     [aggregate expression]: SUM(v) AS sum_v
    + *     [group-by attributes]: x, y, z
    + *   )
    + *
    + * into:
    + *   HashJoin(
    + *     [probe relation]: Aggregate(
    + *                         [input relation]: S
    + *                         [aggregate expression]: SUM(v) AS sum_v
    + *                         [group-by attribute]: u
    + *                       ) AS T
    + *     [build relation]: R
    + *     [join expression]: T.u = R.x
    + *     [project attributes]: sum_v, x, y, z
    + *   )
    + */
    +class ReduceGroupByAttributes : public Rule<physical::Physical> {
    + public:
    +  /**
    +   * @brief Constructor.
    +   *
    +   * @param optimizer_context The optimizer context.
    +   */
    +  explicit ReduceGroupByAttributes(OptimizerContext *optimizer_context)
    +      : optimizer_context_(optimizer_context) {}
    +
    +  ~ReduceGroupByAttributes() override {}
    +
    +  std::string getName() const override {
    +    return "ReduceGroupByAttributes";
    +  }
    +
    +  physical::PhysicalPtr apply(const physical::PhysicalPtr &input) override;
    +
    + private:
    +  struct AttributeInfo {
    +    AttributeInfo(const expressions::AttributeReferencePtr &attribute_in,
    +                  const bool is_unique_in,
    +                  const bool is_fixed_length_in,
    +                  const std::size_t maximum_size_in)
    +        : attribute(attribute_in),
    +          is_unique(is_unique_in),
    +          is_fixed_length(is_fixed_length_in),
    +          maximum_size(maximum_size_in) {}
    +
    +    // In the situation that there are multiple attributes that can serve as the
    +    // group-by key, we define an ordering based on aggregation performance (e.g.
    +    // it is faster to do aggregation with a fix-length attribute as the group-by
    +    // key than with a variable-length attribute).
    +    inline static bool IsBetterThan(const AttributeInfo *lhs,
    +                                    const AttributeInfo *rhs) {
    +      if (lhs->is_unique != rhs->is_unique) {
    +        return lhs->is_unique;
    +      }
    +      if (lhs->is_fixed_length != rhs->is_fixed_length) {
    +        return lhs->is_fixed_length;
    +      }
    +      if (lhs->maximum_size != rhs->maximum_size) {
    +        return lhs->maximum_size < rhs->maximum_size;
    +      }
    +      return lhs->attribute->id() < rhs->attribute->id();
    +    }
    +
    +    const expressions::AttributeReferencePtr attribute;
    +    const bool is_unique;
    +    const bool is_fixed_length;
    +    const std::size_t maximum_size;
    --- End diff --
    
    `maximum_size` is the maximum byte size of the attribute's values. It is used for comparison -- i.e. we prefer attributes with smaller maximum-byte-sizes as group-by keys.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/177#discussion_r99964422
  
    --- Diff: query_optimizer/rules/ReduceGroupByAttributes.hpp ---
    @@ -0,0 +1,143 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_REDUCE_GROUP_BY_ATTRIBUTES_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_RULES_REDUCE_GROUP_BY_ATTRIBUTES_HPP_
    +
    +#include <cstddef>
    +#include <memory>
    +#include <string>
    +#include <unordered_map>
    +#include <utility>
    +
    +#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/ExprId.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/physical/TableReference.hpp"
    +#include "query_optimizer/rules/Rule.hpp"
    +#include "utility/Macros.hpp"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +class OptimizerContext;
    +
    +/**
    + * @brief Rule that applies to a physical plan to reduce the number of group-by
    + *        attributes for Aggregate nodes (to improve performance) by pulling
    + *        joins up the aggregations.
    + *
    + * For example, let R be a relation with PRIMARY KEY x and attributes y, z. Let
    + * S be a relation with FOREIGN KEY u refering to R(x) and attribute v. Then the
    + * optimization rule will transform the physical plan:
    + *   Aggregate(
    + *     [input relation]: HashJoin(
    + *                         [probe relation]: S
    + *                         [build relation]: R
    + *                         [join expression]: S.u = R.x
    + *                         [project attributes]: v, x, y, z
    + *                       )
    + *     [aggregate expression]: SUM(v) AS sum_v
    + *     [group-by attributes]: x, y, z
    + *   )
    + *
    + * into:
    + *   HashJoin(
    + *     [probe relation]: Aggregate(
    + *                         [input relation]: S
    + *                         [aggregate expression]: SUM(v) AS sum_v
    + *                         [group-by attribute]: u
    + *                       ) AS T
    + *     [build relation]: R
    + *     [join expression]: T.u = R.x
    + *     [project attributes]: sum_v, x, y, z
    + *   )
    --- End diff --
    
    I will update the header later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #177: Reduce the number of group-by attribu...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/177#discussion_r99887277
  
    --- Diff: query_optimizer/rules/ReduceGroupByAttributes.hpp ---
    @@ -0,0 +1,143 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_REDUCE_GROUP_BY_ATTRIBUTES_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_RULES_REDUCE_GROUP_BY_ATTRIBUTES_HPP_
    +
    +#include <cstddef>
    +#include <memory>
    +#include <string>
    +#include <unordered_map>
    +#include <utility>
    +
    +#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/ExprId.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/physical/TableReference.hpp"
    +#include "query_optimizer/rules/Rule.hpp"
    +#include "utility/Macros.hpp"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +class OptimizerContext;
    +
    +/**
    + * @brief Rule that applies to a physical plan to reduce the number of group-by
    + *        attributes for Aggregate nodes (to improve performance) by pulling
    + *        joins up the aggregations.
    + *
    + * For example, let R be a relation with PRIMARY KEY x and attributes y, z. Let
    + * S be a relation with FOREIGN KEY u refering to R(x) and attribute v. Then the
    + * optimization rule will transform the physical plan:
    + *   Aggregate(
    + *     [input relation]: HashJoin(
    + *                         [probe relation]: S
    + *                         [build relation]: R
    + *                         [join expression]: S.u = R.x
    + *                         [project attributes]: v, x, y, z
    + *                       )
    + *     [aggregate expression]: SUM(v) AS sum_v
    + *     [group-by attributes]: x, y, z
    + *   )
    + *
    + * into:
    + *   HashJoin(
    + *     [probe relation]: Aggregate(
    + *                         [input relation]: S
    + *                         [aggregate expression]: SUM(v) AS sum_v
    + *                         [group-by attribute]: u
    + *                       ) AS T
    + *     [build relation]: R
    + *     [join expression]: T.u = R.x
    + *     [project attributes]: sum_v, x, y, z
    + *   )
    + */
    +class ReduceGroupByAttributes : public Rule<physical::Physical> {
    + public:
    +  /**
    +   * @brief Constructor.
    +   *
    +   * @param optimizer_context The optimizer context.
    +   */
    +  explicit ReduceGroupByAttributes(OptimizerContext *optimizer_context)
    +      : optimizer_context_(optimizer_context) {}
    +
    +  ~ReduceGroupByAttributes() override {}
    +
    +  std::string getName() const override {
    +    return "ReduceGroupByAttributes";
    +  }
    +
    +  physical::PhysicalPtr apply(const physical::PhysicalPtr &input) override;
    +
    + private:
    +  struct AttributeInfo {
    +    AttributeInfo(const expressions::AttributeReferencePtr &attribute_in,
    +                  const bool is_unique_in,
    +                  const bool is_fixed_length_in,
    +                  const std::size_t maximum_size_in)
    +        : attribute(attribute_in),
    +          is_unique(is_unique_in),
    +          is_fixed_length(is_fixed_length_in),
    +          maximum_size(maximum_size_in) {}
    +
    +    // In the situation that there are multiple attributes that can serve as the
    +    // group-by key, we define an ordering based on aggregation performance (e.g.
    +    // it is faster to do aggregation with a fix-length attribute as the group-by
    +    // key than with a variable-length attribute).
    +    inline static bool IsBetterThan(const AttributeInfo *lhs,
    +                                    const AttributeInfo *rhs) {
    +      if (lhs->is_unique != rhs->is_unique) {
    +        return lhs->is_unique;
    +      }
    +      if (lhs->is_fixed_length != rhs->is_fixed_length) {
    +        return lhs->is_fixed_length;
    +      }
    +      if (lhs->maximum_size != rhs->maximum_size) {
    +        return lhs->maximum_size < rhs->maximum_size;
    +      }
    +      return lhs->attribute->id() < rhs->attribute->id();
    +    }
    +
    +    const expressions::AttributeReferencePtr attribute;
    +    const bool is_unique;
    +    const bool is_fixed_length;
    +    const std::size_t maximum_size;
    --- End diff --
    
    Could you please help me understand `maximum_size` here? What does it mean?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #177: Reduce the number of group-by attributes by ...

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/177
  
    This optimization is somehow different from `AggregatePushDown`. I updated the example in this PR's description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---