You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@quickstep.apache.org by jianqiao <gi...@git.apache.org> on 2017/04/23 01:33:50 UTC

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

GitHub user jianqiao opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/237

    QUICKSTEP-89 Add support for common subexpression.

    This PR adds support for common subexpression elimination in Quickstep. In summary we apply two kinds of optimizations:
    **(1)** Query rewriting for _aggregate expressions_, e.g.
    ```
    SELECT SUM(x), SUM(x)*2, AVG(x), COUNT(x) FROM s;
    ```
    will be rewritten as
    ```
    SELECT sum_x, sum_x * 2, sum_x / count_x, count_x
    FROM (
      SELECT SUM(x) AS sum_x, COUNT(x) AS count_x
    ) t;
    ```
    **(2)** Common subexpression labeling with **per-storage-block memoization table** to cache result column vectors to avoid redundant computation. For example, the evaluation of expressions in
    ```
    SELECT (x + 1) * (y + 2) * 3, (x + 1) * (y + 2), x + 1
    FROM s;
    ```
    will essentially look like:
    ```
    t1 = x + 1
    t2 = t1 * (y + 2)
    Out1 = t2 * 3
    Out2 = Reference(t2)
    Out3 = Reference(t3)
    ```
    Here `t1`, `t2`, `Out1`, `Out2`, `Out3` are per-storage-block `ColumnVector`'s.
    
    This PR together with a (later) new aggregation hash table implementation will improve the performance of TPC-H Q01 from ~11.5s to ~5.3s, with scale factor 100 on a cloud lab machine.
    
    ___
    **Note**: For optimization (1), note that it is not free-of-cost to "re-project" the columns, so it may not worth doing the transformation in situations where a group-by aggregation has large number of groups. E.g., it may actually slow down the query to rewrite
    ```
    SELECT SUM(a), SUM(b), SUM(c), SUM(d), SUM(a)
    FROM s
    GROUP BY x;
    ```
    as
    ```
    SELECT sum_a, sum_b, sum_c, sum_d, sum_a
    FROM (
      SELECT SUM(a) AS sum_a, SUM(b) AS sum_b, SUM(c) AS sum_c, SUM(d) AS sum_d
      FROM s
      GROUP BY x;
    ) t;
    ```
    if the number of distinct values of attribute x is large -- in that case, we avoid one duplicate computation of `SUM(a)`, but introduce 5 extra expensive column copying due to the intermediate relation `t`.
    
    Thus currently we employ a simple heuristic that if either
    _(1) The estimated number of groups for the Aggregate node is smaller than the specified `FLAGS_reuse_aggregate_group_size_threshold`._
    or
    _(2) The ratio of (# of eliminable columns) to (# of original columns) is greater than the specified `FLAGS_reuse_aggregate_ratio_threshold`._
    then we perform the transformation.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-quickstep common-subexpression

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #237
    
----
commit b31f0360a93727ed1250e8e3cf28c242c3d727a1
Author: Jianqiao Zhu <ji...@cs.wisc.edu>
Date:   2017-04-20T20:28:12Z

    Add common-subexpression support.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112848647
  
    --- Diff: query_optimizer/rules/ReuseAggregateExpressions.hpp ---
    @@ -0,0 +1,154 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_REUSE_AGGREGATE_EXPRESSIONS_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_RULES_REUSE_AGGREGATE_EXPRESSIONS_HPP_
    +
    +#include <cstddef>
    +#include <memory>
    +#include <string>
    +#include <vector>
    +
    +#include "query_optimizer/cost_model/StarSchemaSimpleCostModel.hpp"
    +#include "query_optimizer/expressions/Scalar.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/rules/BottomUpRule.hpp"
    +#include "utility/Macros.hpp"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +class OptimizerContext;
    +
    +/** \addtogroup OptimizerRules
    + *  @{
    + */
    +
    +/**
    + * @brief Rule that applies to a physical plan to eliminate duplicate aggregate
    + *        expressions and convert AVG to SUM/COUNT if appropriate.
    + *
    + * This rule rewrites Aggregate to Selection + Aggregate to eliminate duplicates
    + * and strength-reduce AVG functions. E.g.
    + * --
    + *   SELECT SUM(x), SUM(y), SUM(x) * 2, AVG(y), COUNT(*)
    + *   FROM s;
    + * --
    + * will be rewritten as (assume y is not null)
    + * --
    + *   SELECT sum_x, sum_y, sum_x * 2, sum_y / cnt, cnt
    + *   FROM (
    + *     SELECT SUM(x) AS sum_x, SUM(y) AS sum_y, COUNT(*) AS cnt
    + *   ) t;
    + * --
    + *
    + * Meanwhile, note that currently it is not free-of-cost to "re-project" the
    + * columns. So it may not worth doing the transformation in some situations.
    + * E.g. it may actually slow down the query to rewrite
    + * --
    + *   SELECT SUM(a), SUM(b), SUM(c), SUM(d), SUM(a)
    + *   FROM s
    + *   GROUP BY x;
    + * --
    + * as
    + * --
    + *   SELECT sum_a, sum_b, sum_c, sum_d, sum_a
    + *   FROM (
    + *     SELECT SUM(a) AS sum_a, SUM(b) AS sum_b, SUM(c) AS sum_c, SUM(d) AS sum_d
    + *     FROM s
    + *     GROUP BY x;
    + *   ) t;
    + * --
    + * if the number of distinct values of attribute x is large -- in this case, we
    + * avoid one duplicate computation of SUM(a), but introduce 5 extra expensive
    + * column copying with the outer Selection.
    + *
    + * To address the above problem, currently we employ a simple heuristic that if
    + * either
    + * (1) The estimated number of groups for this Aggregate node is smaller than
    + *     the specified FLAGS_reuse_aggregate_group_size_threshold.
    + * or
    + * (2) The ratio of (# of eliminable columns) to (# of original columns) is
    + *     greater than the specified FLAGS_reuse_aggregate_ratio_threshold.
    + * then we perform the transformation.
    + */
    +class ReuseAggregateExpressions : public BottomUpRule<physical::Physical> {
    + public:
    +  /**
    +   * @brief Constructor.
    +   *
    +   * @param optimizer_context The optimizer context.
    +   */
    +  explicit ReuseAggregateExpressions(OptimizerContext *optimizer_context)
    +      : optimizer_context_(optimizer_context) {}
    +
    --- End diff --
    
    Suggest to mark `optimizer_context` `const`.
    
    Also, add a `override` destructor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r113070383
  
    --- Diff: query_optimizer/expressions/CommonSubexpression.hpp ---
    @@ -0,0 +1,141 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_EXPRESSIONS_COMMON_SUBEXPRESSION_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_EXPRESSIONS_COMMON_SUBEXPRESSION_HPP_
    +
    +#include <memory>
    +#include <string>
    +#include <unordered_map>
    +#include <vector>
    +
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/ExprId.hpp"
    +#include "query_optimizer/expressions/Expression.hpp"
    +#include "query_optimizer/expressions/ExpressionType.hpp"
    +#include "query_optimizer/expressions/Scalar.hpp"
    +#include "utility/Macros.hpp"
    +
    +#include "glog/logging.h"
    +
    +namespace quickstep {
    +
    +class Scalar;
    +class Type;
    +
    +namespace optimizer {
    +namespace expressions {
    +
    +/** \addtogroup OptimizerExpressions
    + *  @{
    + */
    +
    +class CommonSubexpression;
    +typedef std::shared_ptr<const CommonSubexpression> CommonSubexpressionPtr;
    +
    +/**
    + * @brief A wrapper expression that represents a common subexpression.
    + */
    +class CommonSubexpression : public Scalar {
    + public:
    +  ExpressionType getExpressionType() const override {
    --- End diff --
    
    We may or may not add an `override` destructor since it is a no-op. Considering that all other subclasses of `Scalar` do not have override destructors, we can just omit it for consistency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-quickstep/pull/237


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112828399
  
    --- Diff: query_optimizer/expressions/Scalar.hpp ---
    @@ -65,10 +67,49 @@ class Scalar : public Expression {
           const std::unordered_map<ExprId, const CatalogAttribute*>& substitution_map)
           const = 0;
     
    +  /**
    +   * @brief Check whether this scalar is semantically equivalent to \p other.
    +   *
    +   * @note The fact that two scalars are semantically equal brings more
    +   *       optimization opportunities, e.g. common subexpression elimination.
    +   *       Meanwhile, it is always safe to assume that two scalars are not equal.
    +   *
    +   * @return True if this scalar equals \p other; false otherwise.
    +   */
    +  virtual bool equals(const ScalarPtr &other) const {
    +    return false;
    +  }
    +
    +  /**
    +   * @brief Get a hash of this scalar.
    +   *
    +   * @return A hash of this scalar.
    +   */
    +  std::size_t hash() const {
    +    if (hash_cache_ == nullptr) {
    +      hash_cache_ = std::make_unique<std::size_t>(computeHash());
    +    }
    +    return *hash_cache_;
    --- End diff --
    
    What do you think if we define `hash_cache_` as a `std::uint64_t` so that we save a pointer reference?
    
    Thus, if its value is `0`, we compute the hash value. Otherwise, return the value.
    
    FYI, sizeof `std::size_t` depends on the system implementation, so I suggest to the 64-bit integer instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #237: QUICKSTEP-89 Add support for common subexpre...

Posted by pateljm <gi...@git.apache.org>.
Github user pateljm commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/237
  
    @zuyu -- can you expand on the bugs? The tests all passed (except for the one that timed out).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112827934
  
    --- Diff: expressions/scalar/ScalarSharedExpression.cpp ---
    @@ -0,0 +1,141 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#include "expressions/scalar/ScalarSharedExpression.hpp"
    +
    +#include <string>
    +#include <utility>
    +#include <vector>
    +
    +#include "catalog/CatalogTypedefs.hpp"
    +#include "expressions/Expressions.pb.h"
    +#include "storage/ValueAccessor.hpp"
    +#include "types/TypedValue.hpp"
    +#include "types/containers/ColumnVector.hpp"
    +#include "utility/ColumnVectorCache.hpp"
    +
    +namespace quickstep {
    +
    +struct SubBlocksReference;
    +
    +ScalarSharedExpression::ScalarSharedExpression(const int share_id,
    +                                               Scalar *operand)
    +    : Scalar(operand->getType()),
    +      share_id_(share_id),
    +      operand_(operand) {
    +}
    +
    +serialization::Scalar ScalarSharedExpression::getProto() const {
    +  serialization::Scalar proto;
    +  proto.set_data_source(serialization::Scalar::SHARED_EXPRESSION);
    +  proto.SetExtension(serialization::ScalarSharedExpression::share_id, share_id_);
    +  proto.MutableExtension(serialization::ScalarSharedExpression::operand)
    +      ->CopyFrom(operand_->getProto());
    +
    +  return proto;
    +}
    +
    +Scalar* ScalarSharedExpression::clone() const {
    +  return new ScalarSharedExpression(share_id_, operand_->clone());
    +}
    +
    +TypedValue ScalarSharedExpression::getValueForSingleTuple(const ValueAccessor &accessor,
    +                                                          const tuple_id tuple) const {
    +  return operand_->getValueForSingleTuple(accessor, tuple);
    +}
    +
    +TypedValue ScalarSharedExpression::getValueForJoinedTuples(
    +    const ValueAccessor &left_accessor,
    +    const relation_id left_relation_id,
    +    const tuple_id left_tuple_id,
    +    const ValueAccessor &right_accessor,
    +    const relation_id right_relation_id,
    +    const tuple_id right_tuple_id) const {
    +  return operand_->getValueForJoinedTuples(left_accessor,
    +                                           left_relation_id,
    +                                           left_tuple_id,
    +                                           right_accessor,
    +                                           right_relation_id,
    +                                           right_tuple_id);
    +}
    +
    +ColumnVectorPtr ScalarSharedExpression::getAllValues(
    +    ValueAccessor *accessor,
    +    const SubBlocksReference *sub_blocks_ref,
    +    ColumnVectorCache *cv_cache) const {
    +  if (cv_cache == nullptr) {
    +    return operand_->getAllValues(accessor, sub_blocks_ref, cv_cache);
    +  } else {
    +    ColumnVectorPtr result;
    +    if (cv_cache->contains(share_id_)) {
    +      result = cv_cache->get(share_id_);
    +    } else {
    +      result = operand_->getAllValues(accessor, sub_blocks_ref, cv_cache);
    +      cv_cache->set(share_id_, result);
    +    }
    +    return result;
    +  }
    --- End diff --
    
    Suggest to change to
    
    ```
      if (cv_cache == nullptr) {
        return operand_->getAllValues(accessor, sub_blocks_ref, cv_cache);
      }
    
      ColumnVectorPtr result;
      if (cv_cache->contains(share_id_)) {
        result = cv_cache->get(share_id_);
      } else {
        result = operand_->getAllValues(accessor, sub_blocks_ref, cv_cache);
        cv_cache->set(share_id_, result);
      }
      return result;
    ```
    
    Ditto for `getAllValuesForJoin` below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #237: QUICKSTEP-89 Add support for common subexpre...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/237
  
    @jianqiao This PR has a bug and some issues to fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112848194
  
    --- Diff: query_optimizer/rules/ExtractCommonSubexpression.hpp ---
    @@ -0,0 +1,179 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_EXTRACT_COMMON_SUBEXPRESSION_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_RULES_EXTRACT_COMMON_SUBEXPRESSION_HPP_
    +
    +#include <cstddef>
    +#include <functional>
    +#include <memory>
    +#include <string>
    +#include <type_traits>
    +#include <unordered_map>
    +#include <unordered_set>
    +#include <vector>
    +
    +#include "query_optimizer/expressions/CommonSubexpression.hpp"
    +#include "query_optimizer/expressions/Expression.hpp"
    +#include "query_optimizer/expressions/ExpressionType.hpp"
    +#include "query_optimizer/expressions/Scalar.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/rules/Rule.hpp"
    +#include "utility/Macros.hpp"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +class OptimizerContext;
    +
    +/** \addtogroup OptimizerRules
    + *  @{
    + */
    +
    +/**
    + * @brief Rule that applies to a physical plan to identify and label common
    + *        subexpressions.
    + *
    + * @note This is essentially a logical optimization pass. However, we need some
    + *       of the physical passes (e.g. ReuseAggregateExpressions) to be finalized
    + *       before this one to maximize optimization opportunities.
    + */
    +class ExtractCommonSubexpression : public Rule<physical::Physical> {
    + public:
    +  /**
    +   * @brief Constructor.
    +   *
    +   * @param optimizer_context The optimizer context.
    +   */
    +  explicit ExtractCommonSubexpression(OptimizerContext *optimizer_context);
    +
    +  ~ExtractCommonSubexpression() override {}
    +
    +  std::string getName() const override {
    +    return "ExtractCommonSubexpression";
    +  }
    +
    +  physical::PhysicalPtr apply(const physical::PhysicalPtr &input) override;
    +
    + private:
    +  physical::PhysicalPtr applyInternal(const physical::PhysicalPtr &input);
    +
    +  struct ScalarHash {
    +    inline std::size_t operator()(const expressions::ScalarPtr &scalar) const {
    +      return scalar->hash();
    +    }
    +  };
    +
    +  struct ScalarEqual {
    +    inline bool operator()(const expressions::ScalarPtr &lhs,
    +                           const expressions::ScalarPtr &rhs) const {
    +      return lhs->equals(rhs);
    +    }
    +  };
    +
    +  // For memorizing whether a subexpression is hashable.
    +  using ScalarHashable = std::unordered_set<expressions::ScalarPtr>;
    +
    +  // For counting the number of occurrences of each subexpression.
    +  using ScalarCounter =
    +      std::unordered_map<expressions::ScalarPtr, std::size_t, ScalarHash, ScalarEqual>;
    +
    +  // For mapping each subexpression to its transformed version.
    +  using ScalarMap =
    +      std::unordered_map<expressions::ScalarPtr,
    +                         expressions::CommonSubexpressionPtr,
    +                         ScalarHash,
    +                         ScalarEqual>;
    +
    +  std::vector<expressions::ExpressionPtr> transformExpressions(
    +      const std::vector<expressions::ExpressionPtr> &expressions);
    +
    +  expressions::ExpressionPtr transformExpression(
    +      const expressions::ExpressionPtr &expression);
    +
    +  // Traverse the expression tree and increase the count of each subexpression.
    +  bool visitAndCount(
    +      const expressions::ExpressionPtr &expression,
    +      ScalarCounter *counter,
    +      ScalarHashable *hashable);
    +
    +  // Traverse the expression tree and transform subexpressions (to common
    +  // subexpressions) if applicable.
    +  expressions::ExpressionPtr visitAndTransform(
    --- End diff --
    
    I believe multiple methods above including `transformExpressions ` could be marked as `const`, as they do not modify the data members in the class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112845784
  
    --- Diff: query_optimizer/expressions/AttributeReference.cpp ---
    @@ -57,6 +60,22 @@ ::quickstep::Scalar *AttributeReference::concretize(
       return new ::quickstep::ScalarAttribute(*found_it->second);
     }
     
    +std::size_t AttributeReference::computeHash() const {
    +  return std::hash<std::size_t>()(static_cast<std::size_t>(id()));
    --- End diff --
    
    FYI, the hash implementation for an integer is to return itself. So, we could just return `id()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r113066931
  
    --- Diff: expressions/scalar/ScalarCaseExpression.cpp ---
    @@ -420,15 +429,15 @@ void ScalarCaseExpression::MultiplexNativeColumnVector(
     void ScalarCaseExpression::MultiplexIndirectColumnVector(
         const TupleIdSequence *source_sequence,
         const TupleIdSequence &case_matches,
    -    IndirectColumnVector *case_result,
    +    const IndirectColumnVector &case_result,
         IndirectColumnVector *output) {
       if (source_sequence == nullptr) {
         TupleIdSequence::const_iterator output_pos_it = case_matches.begin();
         for (std::size_t input_pos = 0;
    -         input_pos < case_result->size();
    +         input_pos < case_result.size();
              ++input_pos, ++output_pos_it) {
           output->positionalWriteTypedValue(*output_pos_it,
    -                                        case_result->moveTypedValue(input_pos));
    --- End diff --
    
    `moveTypedValue ` still makes sense as a public method that might be used in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #237: QUICKSTEP-89 Add support for common subexpre...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/237
  
    @pateljm This is a minor bug not caught by the tests. See [the comment](https://github.com/apache/incubator-quickstep/pull/237#discussion_r112826903).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112847895
  
    --- Diff: query_optimizer/rules/CollapseSelection.hpp ---
    @@ -0,0 +1,62 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_COLLAPSE_SELECTION_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_RULES_COLLAPSE_SELECTION_HPP_
    +
    +#include <string>
    +
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/rules/BottomUpRule.hpp"
    +#include "utility/Macros.hpp"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +/** \addtogroup OptimizerRules
    + *  @{
    + */
    +
    +/**
    + * @brief Merges nested Selections into one single Selection.
    + */
    +class CollapseSelection : public BottomUpRule<physical::Physical> {
    + public:
    +  /**
    +   * @brief Constructor.
    +   */
    +  CollapseSelection() {}
    +
    --- End diff --
    
    We need to add a `override` destructor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112828368
  
    --- Diff: query_optimizer/expressions/Scalar.hpp ---
    @@ -65,10 +67,49 @@ class Scalar : public Expression {
           const std::unordered_map<ExprId, const CatalogAttribute*>& substitution_map)
           const = 0;
     
    +  /**
    +   * @brief Check whether this scalar is semantically equivalent to \p other.
    +   *
    +   * @note The fact that two scalars are semantically equal brings more
    +   *       optimization opportunities, e.g. common subexpression elimination.
    +   *       Meanwhile, it is always safe to assume that two scalars are not equal.
    +   *
    +   * @return True if this scalar equals \p other; false otherwise.
    +   */
    +  virtual bool equals(const ScalarPtr &other) const {
    +    return false;
    +  }
    +
    +  /**
    +   * @brief Get a hash of this scalar.
    +   *
    +   * @return A hash of this scalar.
    +   */
    +  std::size_t hash() const {
    +    if (hash_cache_ == nullptr) {
    +      hash_cache_ = std::make_unique<std::size_t>(computeHash());
    +    }
    +    return *hash_cache_;
    +  }
    +
      protected:
       Scalar() {}
     
    +  /**
    +   * @brief Compute a hash of this scalar.
    +   *
    +   * @note Override this method to actually compute the hash. Note that the
    +   *       public method hash() is a caching wrapper for this method.
    +   *
    +   * @return A hash of this scalar.
    +   */
    +  virtual std::size_t computeHash() const {
    +    throw HashNotSupported("Unsupported computeHash() in " + getName());
    --- End diff --
    
    Suggest to change to a pure virtual method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112826903
  
    --- Diff: expressions/scalar/Scalar.hpp ---
    @@ -55,6 +59,7 @@ class Scalar {
         kUnaryExpression,
         kBinaryExpression,
         kCaseExpression,
    +    kSharedExpression,
    --- End diff --
    
    This is a bug: the value of `kSharedExpression` should be same with its index in `kScalarDataSourceNames` defined in the source file.
    
    Suggest to reorder two values `SimpleCase` and `SharedExpression` in the source file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112847605
  
    --- Diff: query_optimizer/expressions/CommonSubexpression.hpp ---
    @@ -0,0 +1,141 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_EXPRESSIONS_COMMON_SUBEXPRESSION_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_EXPRESSIONS_COMMON_SUBEXPRESSION_HPP_
    +
    +#include <memory>
    +#include <string>
    +#include <unordered_map>
    +#include <vector>
    +
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/ExprId.hpp"
    +#include "query_optimizer/expressions/Expression.hpp"
    +#include "query_optimizer/expressions/ExpressionType.hpp"
    +#include "query_optimizer/expressions/Scalar.hpp"
    +#include "utility/Macros.hpp"
    +
    +#include "glog/logging.h"
    +
    +namespace quickstep {
    +
    +class Scalar;
    +class Type;
    +
    +namespace optimizer {
    +namespace expressions {
    +
    +/** \addtogroup OptimizerExpressions
    + *  @{
    + */
    +
    +class CommonSubexpression;
    +typedef std::shared_ptr<const CommonSubexpression> CommonSubexpressionPtr;
    +
    +/**
    + * @brief A wrapper expression that represents a common subexpression.
    + */
    +class CommonSubexpression : public Scalar {
    + public:
    +  ExpressionType getExpressionType() const override {
    +    return ExpressionType::kCommonSubexpression;
    +  }
    +
    +  std::string getName() const override {
    +    return "CommonSubexpression";
    +  }
    +
    +  bool isConstant() const override {
    +    return operand_->isConstant();
    +  }
    +
    +  /**
    +   * @return The unique ID for the equivalence class that this common subexpression
    +   *         belongs to.
    +   */
    +  inline ExprId common_subexpression_id() const {
    +    return common_subexpression_id_;
    +  }
    +
    +  /**
    +   * @return The operand that represents the underlying subexpression.
    +   */
    +  const ScalarPtr& operand() const {
    +    return operand_;
    +  }
    --- End diff --
    
    We need to move above two methods after all `override` ones.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112848246
  
    --- Diff: query_optimizer/rules/ReuseAggregateExpressions.cpp ---
    @@ -0,0 +1,349 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#include "query_optimizer/rules/ReuseAggregateExpressions.hpp"
    +
    +#include <cstddef>
    +#include <list>
    +#include <map>
    +#include <memory>
    +#include <unordered_map>
    +#include <vector>
    +
    +#include "expressions/aggregation/AggregateFunction.hpp"
    +#include "expressions/aggregation/AggregateFunctionFactory.hpp"
    +#include "expressions/aggregation/AggregationID.hpp"
    +#include "query_optimizer/OptimizerContext.hpp"
    +#include "query_optimizer/expressions/AggregateFunction.hpp"
    +#include "query_optimizer/expressions/Alias.hpp"
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/BinaryExpression.hpp"
    +#include "query_optimizer/expressions/ExpressionType.hpp"
    +#include "query_optimizer/expressions/ExpressionUtil.hpp"
    +#include "query_optimizer/expressions/NamedExpression.hpp"
    +#include "query_optimizer/expressions/Scalar.hpp"
    +#include "query_optimizer/physical/Aggregate.hpp"
    +#include "query_optimizer/physical/PatternMatcher.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/physical/PhysicalType.hpp"
    +#include "query_optimizer/physical/Selection.hpp"
    +#include "query_optimizer/physical/TopLevelPlan.hpp"
    +#include "types/operations/binary_operations/BinaryOperation.hpp"
    +#include "types/operations/binary_operations/BinaryOperationFactory.hpp"
    +#include "types/operations/binary_operations/BinaryOperationID.hpp"
    +#include "utility/HashError.hpp"
    +
    +#include "gflags/gflags.h"
    +
    +#include "glog/logging.h"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +DEFINE_uint64(reuse_aggregate_group_size_threshold, 1000u,
    +              "The threshold on estimated number of groups for an aggregation "
    +              "below which the ReuseAggregateExpressions optimization will be "
    +              "performed.");
    +
    +DEFINE_double(reuse_aggregate_ratio_threshold, 0.3,
    +              "The threshold on the ratio of (# of eliminable columns) to "
    +              "(# of original columns) for an aggregation above which the "
    +              "ReuseAggregateExpressions optimization will be performed.");
    +
    +namespace E = ::quickstep::optimizer::expressions;
    +namespace P = ::quickstep::optimizer::physical;
    +
    +void ReuseAggregateExpressions::init(const P::PhysicalPtr &input) {
    +  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
    --- End diff --
    
    Change to `DCHECK_EQ(P::PhysicalType::kTopLevelPlan, input->getPhysicalType());`, if not defined as an enum class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112826746
  
    --- Diff: expressions/predicate/Predicate.hpp ---
    @@ -67,6 +72,11 @@ class Predicate {
       virtual ~Predicate() {
    --- End diff --
    
    Need to change to `~Predicate() override {`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #237: QUICKSTEP-89 Add support for common subexpre...

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/237
  
    Will fix the issues in one PR shortly after.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r113070668
  
    --- Diff: query_optimizer/expressions/Scalar.hpp ---
    @@ -65,10 +67,49 @@ class Scalar : public Expression {
           const std::unordered_map<ExprId, const CatalogAttribute*>& substitution_map)
           const = 0;
     
    +  /**
    +   * @brief Check whether this scalar is semantically equivalent to \p other.
    +   *
    +   * @note The fact that two scalars are semantically equal brings more
    +   *       optimization opportunities, e.g. common subexpression elimination.
    +   *       Meanwhile, it is always safe to assume that two scalars are not equal.
    +   *
    +   * @return True if this scalar equals \p other; false otherwise.
    +   */
    +  virtual bool equals(const ScalarPtr &other) const {
    +    return false;
    +  }
    +
    +  /**
    +   * @brief Get a hash of this scalar.
    +   *
    +   * @return A hash of this scalar.
    +   */
    +  std::size_t hash() const {
    +    if (hash_cache_ == nullptr) {
    +      hash_cache_ = std::make_unique<std::size_t>(computeHash());
    +    }
    +    return *hash_cache_;
    --- End diff --
    
    There is no performance consideration here and the pointer makes it clear for the "null" case, since `0` itself can be a valid hash value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #237: QUICKSTEP-89 Add support for common subexpre...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/237
  
    I have not done one pass. May have some more comments coming.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112848427
  
    --- Diff: query_optimizer/rules/ReuseAggregateExpressions.cpp ---
    @@ -0,0 +1,349 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#include "query_optimizer/rules/ReuseAggregateExpressions.hpp"
    +
    +#include <cstddef>
    +#include <list>
    +#include <map>
    +#include <memory>
    +#include <unordered_map>
    +#include <vector>
    +
    +#include "expressions/aggregation/AggregateFunction.hpp"
    +#include "expressions/aggregation/AggregateFunctionFactory.hpp"
    +#include "expressions/aggregation/AggregationID.hpp"
    +#include "query_optimizer/OptimizerContext.hpp"
    +#include "query_optimizer/expressions/AggregateFunction.hpp"
    +#include "query_optimizer/expressions/Alias.hpp"
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/BinaryExpression.hpp"
    +#include "query_optimizer/expressions/ExpressionType.hpp"
    +#include "query_optimizer/expressions/ExpressionUtil.hpp"
    +#include "query_optimizer/expressions/NamedExpression.hpp"
    +#include "query_optimizer/expressions/Scalar.hpp"
    +#include "query_optimizer/physical/Aggregate.hpp"
    +#include "query_optimizer/physical/PatternMatcher.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/physical/PhysicalType.hpp"
    +#include "query_optimizer/physical/Selection.hpp"
    +#include "query_optimizer/physical/TopLevelPlan.hpp"
    +#include "types/operations/binary_operations/BinaryOperation.hpp"
    +#include "types/operations/binary_operations/BinaryOperationFactory.hpp"
    +#include "types/operations/binary_operations/BinaryOperationID.hpp"
    +#include "utility/HashError.hpp"
    +
    +#include "gflags/gflags.h"
    +
    +#include "glog/logging.h"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +DEFINE_uint64(reuse_aggregate_group_size_threshold, 1000u,
    +              "The threshold on estimated number of groups for an aggregation "
    +              "below which the ReuseAggregateExpressions optimization will be "
    +              "performed.");
    +
    +DEFINE_double(reuse_aggregate_ratio_threshold, 0.3,
    +              "The threshold on the ratio of (# of eliminable columns) to "
    +              "(# of original columns) for an aggregation above which the "
    +              "ReuseAggregateExpressions optimization will be performed.");
    +
    +namespace E = ::quickstep::optimizer::expressions;
    +namespace P = ::quickstep::optimizer::physical;
    +
    +void ReuseAggregateExpressions::init(const P::PhysicalPtr &input) {
    +  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
    +
    +  // Initialize cost model.
    +  const P::TopLevelPlanPtr top_level_plan =
    +     std::static_pointer_cast<const P::TopLevelPlan>(input);
    +  cost_model_.reset(
    +      new cost::StarSchemaSimpleCostModel(top_level_plan->shared_subplans()));
    +}
    +
    +P::PhysicalPtr ReuseAggregateExpressions::applyToNode(
    +    const P::PhysicalPtr &input) {
    +  P::AggregatePtr aggregate;
    +  if (!P::SomeAggregate::MatchesWithConditionalCast(input, &aggregate)) {
    +    return input;
    +  }
    +
    +  const std::vector<E::AliasPtr> &agg_exprs = aggregate->aggregate_expressions();
    +
    +  // Maps aggregated expressions to AggregationID + positions.
    +  //
    +  // For example,
    +  // --
    +  //   SELECT SUM(x+1), AVG(x+1), SUM(x+1), COUNT(*), SUM(y) FROM s;
    +  // --
    +  // will generate *agg_expr_info* as
    +  // --
    +  // {
    +  //   x+1: { kSum: [0, 2], kAvg: [1] },
    +  //   y: { kSum: [4] },
    +  // }
    +  // --
    +  // and COUNT(*) will be recorded inside *count_star_info*.
    +  std::unordered_map<E::ScalarPtr,
    +                     std::map<AggregationID, std::vector<std::size_t>>,
    +                     ScalarHash, ScalarEqual> agg_expr_info;
    +  std::list<std::size_t> count_star_info;
    +
    +  for (std::size_t i = 0; i < agg_exprs.size(); ++i) {
    +    DCHECK(agg_exprs[i]->expression()->getExpressionType()
    +               == E::ExpressionType::kAggregateFunction);
    +    const E::AggregateFunctionPtr agg_expr =
    +        std::static_pointer_cast<const E::AggregateFunction>(
    +            agg_exprs[i]->expression());
    +
    +    // Skip DISTINCT aggregations.
    +    if (agg_expr->is_distinct()) {
    +      continue;
    +    }
    +
    +    const AggregationID agg_id = agg_expr->getAggregate().getAggregationID();
    +    const std::vector<E::ScalarPtr> &arguments = agg_expr->getArguments();
    +
    +    // Currently we only consider aggregate functions with 0 or 1 argument.
    +    if (agg_id == AggregationID::kCount) {
    +      if (arguments.empty()) {
    +        count_star_info.emplace_front(i);
    +        continue;
    +      } else if (!arguments.front()->getValueType().isNullable()) {
    +        // For COUNT(x) where x is not null, we view it as a more general COUNT(*).
    +        count_star_info.emplace_back(i);
    +        continue;
    +      }
    +    }
    +    if (arguments.size() == 1) {
    +      try {
    +        agg_expr_info[arguments.front()][agg_id].emplace_back(i);
    +      } catch (const HashNotSupported &e) {
    +        continue;
    +      }
    +    }
    +  }
    +
    +  // Now for each aggregate expression, figure out whether we can avoid the
    +  // computation by reusing other aggregate expression's result.
    +  std::vector<std::unique_ptr<AggregateReference>> agg_refs(agg_exprs.size());
    +
    +  constexpr std::size_t kInvalidRef = static_cast<std::size_t>(-1);
    +  std::size_t count_star_ref;
    +
    +  // Check whether COUNT(*) is available.
    +  if (count_star_info.empty()) {
    +    count_star_ref = kInvalidRef;
    +  } else {
    +    auto it = count_star_info.begin();
    +    count_star_ref = *it;
    +
    +    for (++it; it != count_star_info.end(); ++it) {
    +      agg_refs[*it].reset(new AggregateReference(count_star_ref));
    --- End diff --
    
    We should not use `new` operations for smart pointers to avoid memory leaks upon system failures.
    
    Instead, we could use `agg_refs[*it] = std::make_unique<AggregateReference>(count_star_ref);`.
    
    Ditto for all `new` usages below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112847711
  
    --- Diff: query_optimizer/expressions/SimpleCase.cpp ---
    @@ -161,6 +163,50 @@ ::quickstep::Scalar* SimpleCase::concretize(
           else_result_expression.release());
     }
     
    +std::size_t SimpleCase::computeHash() const {
    +  std::size_t hash_code =
    +      CombineHashes(static_cast<std::size_t>(ExpressionType::kSimpleCase),
    +                    case_operand_->hash());
    +  for (std::size_t i = 0; i < condition_operands_.size(); ++i) {
    +    hash_code = CombineHashes(hash_code, condition_operands_[i]->hash());
    +    hash_code = CombineHashes(hash_code, conditional_result_expressions_[i]->hash());
    +  }
    +  if (else_result_expression_ != nullptr) {
    +    hash_code = CombineHashes(hash_code, else_result_expression_->hash());
    +  }
    +  return hash_code;
    +}
    +
    +bool SimpleCase::equals(const ScalarPtr &other) const {
    +  SimpleCasePtr expr;
    +  if (!SomeSimpleCase::MatchesWithConditionalCast(other, &expr)) {
    +    return false;
    +  }
    +  if (!case_operand_->equals(expr->case_operand_)) {
    +    return false;
    +  }
    +  if (condition_operands_.size() != expr->condition_operands_.size()) {
    +    return false;
    +  }
    +  for (std::size_t i = 0; i < condition_operands_.size(); ++i) {
    +    if (!condition_operands_[i]->equals(expr->condition_operands_[i])
    +        || !conditional_result_expressions_[i]->equals(
    +                expr->conditional_result_expressions_[i])) {
    +      return false;
    +    }
    +  }
    +  if ((else_result_expression_ == nullptr
    +       || expr->else_result_expression_ == nullptr)
    --- End diff --
    
    Please add four whitespace indentations.
    
    Actually, to be consistent with other bool expression usages, I recommend to refactor as following:
    
    ```
      if ((else_result_expression_ == nullptr ||
               expr->else_result_expression_ == nullptr)) &&
          else_result_expression_ != expr->else_result_expression_) {
      ...
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #237: QUICKSTEP-89 Add support for common subexpre...

Posted by pateljm <gi...@git.apache.org>.
Github user pateljm commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/237
  
    Excellent work @jianqiao. LGTM. Merging.
    
    Side note: One of the jobs timed out, so we should consider removing some of the tests from the type system once you are done with the new type system.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112847881
  
    --- Diff: query_optimizer/rules/CollapseSelection.cpp ---
    @@ -0,0 +1,59 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#include "query_optimizer/rules/CollapseSelection.hpp"
    +
    +#include <vector>
    +
    +#include "query_optimizer/expressions/NamedExpression.hpp"
    +#include "query_optimizer/physical/PatternMatcher.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/physical/Selection.hpp"
    +#include "query_optimizer/rules/RuleHelper.hpp"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +namespace E = ::quickstep::optimizer::expressions;
    +namespace P = ::quickstep::optimizer::physical;
    +
    +P::PhysicalPtr CollapseSelection::applyToNode(const P::PhysicalPtr &input) {
    +  P::SelectionPtr selection;
    +  P::SelectionPtr child_selection;
    +
    +  // TODO(jianqiao): Handle the case where filter predicates are present.
    +  if (P::SomeSelection::MatchesWithConditionalCast(input, &selection) &&
    +      P::SomeSelection::MatchesWithConditionalCast(selection->input(), &child_selection) &&
    +      selection->filter_predicate() == nullptr &&
    +      child_selection->filter_predicate() == nullptr) {
    +    std::vector<E::NamedExpressionPtr> project_expressions =
    +        selection->project_expressions();
    +    PullUpProjectExpressions(child_selection->project_expressions(),
    +                             {} /* non_project_expression_lists */,
    +                             {&project_expressions} /* project_expression_lists */);
    --- End diff --
    
    Suggest to change to `{ &project_expressions }` for more readability.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112828079
  
    --- Diff: expressions/scalar/tests/ScalarCaseExpression_unittest.cpp ---
    @@ -223,8 +223,8 @@ TEST_F(ScalarCaseExpressionTest, BasicComparisonAndLiteralTest) {
           new ScalarLiteral(TypedValue(kVarChar, kThirdLawString, std::strlen(kThirdLawString) + 1),
                             varchar_type));
     
    -  std::unique_ptr<ColumnVector> result_cv(
    -      case_expr.getAllValues(&sample_data_value_accessor_, nullptr));
    +  ColumnVectorPtr result_cv(
    +      case_expr.getAllValues(&sample_data_value_accessor_, nullptr, nullptr));
    --- End diff --
    
    Please add comments for `nullptr`. Ditto below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112848590
  
    --- Diff: query_optimizer/rules/ReuseAggregateExpressions.cpp ---
    @@ -0,0 +1,349 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#include "query_optimizer/rules/ReuseAggregateExpressions.hpp"
    +
    +#include <cstddef>
    +#include <list>
    +#include <map>
    +#include <memory>
    +#include <unordered_map>
    +#include <vector>
    +
    +#include "expressions/aggregation/AggregateFunction.hpp"
    +#include "expressions/aggregation/AggregateFunctionFactory.hpp"
    +#include "expressions/aggregation/AggregationID.hpp"
    +#include "query_optimizer/OptimizerContext.hpp"
    +#include "query_optimizer/expressions/AggregateFunction.hpp"
    +#include "query_optimizer/expressions/Alias.hpp"
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/BinaryExpression.hpp"
    +#include "query_optimizer/expressions/ExpressionType.hpp"
    +#include "query_optimizer/expressions/ExpressionUtil.hpp"
    +#include "query_optimizer/expressions/NamedExpression.hpp"
    +#include "query_optimizer/expressions/Scalar.hpp"
    +#include "query_optimizer/physical/Aggregate.hpp"
    +#include "query_optimizer/physical/PatternMatcher.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/physical/PhysicalType.hpp"
    +#include "query_optimizer/physical/Selection.hpp"
    +#include "query_optimizer/physical/TopLevelPlan.hpp"
    +#include "types/operations/binary_operations/BinaryOperation.hpp"
    +#include "types/operations/binary_operations/BinaryOperationFactory.hpp"
    +#include "types/operations/binary_operations/BinaryOperationID.hpp"
    +#include "utility/HashError.hpp"
    +
    +#include "gflags/gflags.h"
    +
    +#include "glog/logging.h"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +DEFINE_uint64(reuse_aggregate_group_size_threshold, 1000u,
    +              "The threshold on estimated number of groups for an aggregation "
    +              "below which the ReuseAggregateExpressions optimization will be "
    +              "performed.");
    +
    +DEFINE_double(reuse_aggregate_ratio_threshold, 0.3,
    +              "The threshold on the ratio of (# of eliminable columns) to "
    +              "(# of original columns) for an aggregation above which the "
    +              "ReuseAggregateExpressions optimization will be performed.");
    +
    +namespace E = ::quickstep::optimizer::expressions;
    +namespace P = ::quickstep::optimizer::physical;
    +
    +void ReuseAggregateExpressions::init(const P::PhysicalPtr &input) {
    +  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
    +
    +  // Initialize cost model.
    +  const P::TopLevelPlanPtr top_level_plan =
    +     std::static_pointer_cast<const P::TopLevelPlan>(input);
    +  cost_model_.reset(
    +      new cost::StarSchemaSimpleCostModel(top_level_plan->shared_subplans()));
    +}
    +
    +P::PhysicalPtr ReuseAggregateExpressions::applyToNode(
    +    const P::PhysicalPtr &input) {
    +  P::AggregatePtr aggregate;
    +  if (!P::SomeAggregate::MatchesWithConditionalCast(input, &aggregate)) {
    +    return input;
    +  }
    +
    +  const std::vector<E::AliasPtr> &agg_exprs = aggregate->aggregate_expressions();
    +
    +  // Maps aggregated expressions to AggregationID + positions.
    +  //
    +  // For example,
    +  // --
    +  //   SELECT SUM(x+1), AVG(x+1), SUM(x+1), COUNT(*), SUM(y) FROM s;
    +  // --
    +  // will generate *agg_expr_info* as
    +  // --
    +  // {
    +  //   x+1: { kSum: [0, 2], kAvg: [1] },
    +  //   y: { kSum: [4] },
    +  // }
    +  // --
    +  // and COUNT(*) will be recorded inside *count_star_info*.
    +  std::unordered_map<E::ScalarPtr,
    +                     std::map<AggregationID, std::vector<std::size_t>>,
    +                     ScalarHash, ScalarEqual> agg_expr_info;
    +  std::list<std::size_t> count_star_info;
    +
    +  for (std::size_t i = 0; i < agg_exprs.size(); ++i) {
    +    DCHECK(agg_exprs[i]->expression()->getExpressionType()
    +               == E::ExpressionType::kAggregateFunction);
    +    const E::AggregateFunctionPtr agg_expr =
    +        std::static_pointer_cast<const E::AggregateFunction>(
    +            agg_exprs[i]->expression());
    +
    +    // Skip DISTINCT aggregations.
    +    if (agg_expr->is_distinct()) {
    +      continue;
    +    }
    +
    +    const AggregationID agg_id = agg_expr->getAggregate().getAggregationID();
    +    const std::vector<E::ScalarPtr> &arguments = agg_expr->getArguments();
    +
    +    // Currently we only consider aggregate functions with 0 or 1 argument.
    +    if (agg_id == AggregationID::kCount) {
    +      if (arguments.empty()) {
    +        count_star_info.emplace_front(i);
    +        continue;
    +      } else if (!arguments.front()->getValueType().isNullable()) {
    +        // For COUNT(x) where x is not null, we view it as a more general COUNT(*).
    +        count_star_info.emplace_back(i);
    +        continue;
    +      }
    +    }
    +    if (arguments.size() == 1) {
    +      try {
    +        agg_expr_info[arguments.front()][agg_id].emplace_back(i);
    +      } catch (const HashNotSupported &e) {
    +        continue;
    +      }
    +    }
    +  }
    +
    +  // Now for each aggregate expression, figure out whether we can avoid the
    +  // computation by reusing other aggregate expression's result.
    +  std::vector<std::unique_ptr<AggregateReference>> agg_refs(agg_exprs.size());
    +
    +  constexpr std::size_t kInvalidRef = static_cast<std::size_t>(-1);
    +  std::size_t count_star_ref;
    +
    +  // Check whether COUNT(*) is available.
    +  if (count_star_info.empty()) {
    +    count_star_ref = kInvalidRef;
    +  } else {
    +    auto it = count_star_info.begin();
    +    count_star_ref = *it;
    +
    +    for (++it; it != count_star_info.end(); ++it) {
    +      agg_refs[*it].reset(new AggregateReference(count_star_ref));
    +    }
    +  }
    +
    +  // Iterate through agg_expr_info to find all transformation opportunities,
    +  // and record them into agg_refs.
    +  for (const auto &it : agg_expr_info) {
    +    const auto &ref_map = it.second;
    +
    +    // First, check whether AVG can be reduced to SUM/COUNT.
    +    bool is_avg_processed = false;
    +
    +    const auto avg_it = ref_map.find(AggregationID::kAvg);
    +    if (avg_it != ref_map.end()) {
    +      std::size_t count_ref = kInvalidRef;
    +
    +      // To reduce AVG to SUM/COUNT, we need a COUNT available.
    +      // TODO(jianqiao): We may even add a new COUNT(*) aggregation if it is not
    +      // there. E.g. when there are at least two AVG(...) aggregate functions.
    +      if (it.first->getValueType().isNullable()) {
    +        const auto count_it = ref_map.find(AggregationID::kCount);
    +        if (count_it != ref_map.end()) {
    +          DCHECK(!count_it->second.empty());
    +          count_ref = count_it->second.front();
    +        }
    +      } else {
    +        count_ref = count_star_ref;
    +      }
    +
    +      if (count_ref != kInvalidRef) {
    +        // It is done if there is an existing SUM. Otherwise we strength-reduce
    +        // the current AVG to SUM.
    +        const auto sum_it = ref_map.find(AggregationID::kSum);
    +        const std::size_t sum_ref =
    +            sum_it == ref_map.end() ? kInvalidRef : sum_it->second.front();
    +
    +        for (const std::size_t idx : avg_it->second) {
    +          agg_refs[idx].reset(new AggregateReference(sum_ref, count_ref));
    +        }
    +        is_avg_processed = true;
    +      }
    +    }
    +
    +    // Then, identify duplicate aggregate expressions.
    +    for (const auto &ref_it : ref_map) {
    +      if (ref_it.first == AggregationID::kAvg && is_avg_processed) {
    +        continue;
    +      }
    +      const auto &indices = ref_it.second;
    +      DCHECK(!indices.empty());
    +      const std::size_t ref = indices.front();
    +      for (std::size_t i = 1; i < indices.size(); ++i) {
    +        agg_refs[indices[i]].reset(new AggregateReference(ref));
    +      }
    +    }
    +  }
    +
    +  // Count the number of eliminable aggregate expressions.
    +  std::size_t num_eliminable = 0;
    +  for (const auto &agg_ref : agg_refs) {
    +    if (agg_ref != nullptr) {
    +      ++num_eliminable;
    +    }
    +  }
    +
    +  if (num_eliminable == 0) {
    +    return input;
    +  }
    +
    +  // Now we need to make a decision since it is not always benefitial to perform
    +  // the transformation. Currently we employ a simple heuristic that if either
    +  // (1) The estimated number of groups for this Aggregate node is smaller than
    +  //     the specified FLAGS_reuse_aggregate_group_size_threshold.
    +  // or
    +  // (2) The ratio of (# of eliminable columns) to (# of original columns) is
    +  //     greater than the specified FLAGS_reuse_aggregate_ratio_threshold.
    +  // then we perform the transformation.
    +  const bool is_group_size_condition_satisfied =
    +      cost_model_->estimateNumGroupsForAggregate(aggregate)
    +          < FLAGS_reuse_aggregate_group_size_threshold;
    +  const bool is_ratio_condition_satisfied =
    +      static_cast<double>(num_eliminable) / agg_exprs.size()
    +          > FLAGS_reuse_aggregate_ratio_threshold;
    +
    +  if (!is_group_size_condition_satisfied && !is_ratio_condition_satisfied) {
    +    return input;
    +  }
    +
    +  // Now we transform the original Aggregate to a reduced Aggregate + a wrapping
    +  // Selection.
    +
    +  // Aggregate expressions for the new Aggregate.
    +  std::vector<E::AliasPtr> new_agg_exprs;
    +
    +  // Project expressions for the new Selection.
    +  std::vector<E::NamedExpressionPtr> new_select_exprs;
    +
    +  for (const auto &grouping_expr : aggregate->grouping_expressions()) {
    +    new_select_exprs.emplace_back(E::ToRef(grouping_expr));
    +  }
    +
    +  const std::vector<E::AttributeReferencePtr> agg_attrs = E::ToRefVector(agg_exprs);
    +
    +  for (std::size_t i = 0; i < agg_refs.size(); ++i) {
    +    const auto &agg_ref = agg_refs[i];
    +    const E::AliasPtr &agg_expr = agg_exprs[i];
    +
    +    if (agg_ref == nullptr) {
    +      // Case 1: this aggregate expression can not be eliminated.
    +      new_agg_exprs.emplace_back(agg_expr);
    +      new_select_exprs.emplace_back(
    +          E::AttributeReference::Create(agg_expr->id(),
    +                                        agg_expr->attribute_name(),
    +                                        agg_expr->attribute_alias(),
    +                                        agg_expr->relation_name(),
    +                                        agg_expr->getValueType(),
    +                                        E::AttributeReferenceScope::kLocal));
    +    } else {
    +      switch (agg_ref->kind) {
    +        // Case 2.1: this aggregate expression can be eliminated.
    +        case AggregateReference::kDirect: {
    +          new_select_exprs.emplace_back(
    +              E::Alias::Create(agg_expr->id(),
    +                               agg_attrs[agg_ref->first_ref],
    +                               agg_expr->attribute_name(),
    +                               agg_expr->attribute_alias(),
    +                               agg_expr->relation_name()));
    +          break;
    +        }
    +        // Case 2.2: this aggregate expression is an AVG.
    +        case AggregateReference::kAvg: {
    +          E::AttributeReferencePtr sum_attr;
    +          if (agg_ref->first_ref == kInvalidRef) {
    +            // Case 2.2.1: If there is no existing SUM, we need to convert this
    +            //             AVG to SUM.
    +            const E::AggregateFunctionPtr avg_expr =
    +                std::static_pointer_cast<const E::AggregateFunction>(agg_expr->expression());
    +
    +            const AggregateFunction &sum_func =
    +                AggregateFunctionFactory::Get(AggregationID::kSum);
    +            const E::AggregateFunctionPtr sum_expr =
    +                E::AggregateFunction::Create(sum_func,
    +                                             avg_expr->getArguments(),
    +                                             avg_expr->is_vector_aggregate(),
    +                                             avg_expr->is_distinct());
    +            new_agg_exprs.emplace_back(
    +                E::Alias::Create(optimizer_context_->nextExprId(),
    +                                 sum_expr,
    +                                 agg_expr->attribute_name(),
    +                                 agg_expr->attribute_alias(),
    +                                 agg_expr->relation_name()));
    +
    +            sum_attr = E::ToRef(new_agg_exprs.back());
    +          } else {
    +            // Case 2.2.2: If there is a SUM somewhere, we just eliminate this
    +            //             AVG and use the result from that SUM.
    +            sum_attr = agg_attrs[agg_ref->first_ref];
    +          }
    +
    +          // Obtain AVG by evaluating SUM/COUNT in Selection.
    +          const BinaryOperation &divide_op =
    +              BinaryOperationFactory::GetBinaryOperation(BinaryOperationID::kDivide);
    +          const E::BinaryExpressionPtr avg_expr =
    +              E::BinaryExpression::Create(divide_op,
    +                                          sum_attr,
    +                                          agg_attrs[agg_ref->second_ref]);
    +          new_select_exprs.emplace_back(
    +              E::Alias::Create(agg_expr->id(),
    +                               avg_expr,
    +                               agg_expr->attribute_name(),
    +                               agg_expr->attribute_alias(),
    +                               agg_expr->relation_name()));
    +        }
    +      }
    --- End diff --
    
    To be safe, please add a `break;` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112847577
  
    --- Diff: query_optimizer/expressions/CommonSubexpression.hpp ---
    @@ -0,0 +1,141 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_EXPRESSIONS_COMMON_SUBEXPRESSION_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_EXPRESSIONS_COMMON_SUBEXPRESSION_HPP_
    +
    +#include <memory>
    +#include <string>
    +#include <unordered_map>
    +#include <vector>
    +
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/ExprId.hpp"
    +#include "query_optimizer/expressions/Expression.hpp"
    +#include "query_optimizer/expressions/ExpressionType.hpp"
    +#include "query_optimizer/expressions/Scalar.hpp"
    +#include "utility/Macros.hpp"
    +
    +#include "glog/logging.h"
    +
    +namespace quickstep {
    +
    +class Scalar;
    +class Type;
    +
    +namespace optimizer {
    +namespace expressions {
    +
    +/** \addtogroup OptimizerExpressions
    + *  @{
    + */
    +
    +class CommonSubexpression;
    +typedef std::shared_ptr<const CommonSubexpression> CommonSubexpressionPtr;
    +
    +/**
    + * @brief A wrapper expression that represents a common subexpression.
    + */
    +class CommonSubexpression : public Scalar {
    + public:
    +  ExpressionType getExpressionType() const override {
    --- End diff --
    
    We need to add a `override` destructor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112848219
  
    --- Diff: query_optimizer/rules/ExtractCommonSubexpression.hpp ---
    @@ -0,0 +1,179 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_EXTRACT_COMMON_SUBEXPRESSION_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_RULES_EXTRACT_COMMON_SUBEXPRESSION_HPP_
    +
    +#include <cstddef>
    +#include <functional>
    +#include <memory>
    +#include <string>
    +#include <type_traits>
    +#include <unordered_map>
    +#include <unordered_set>
    +#include <vector>
    +
    +#include "query_optimizer/expressions/CommonSubexpression.hpp"
    +#include "query_optimizer/expressions/Expression.hpp"
    +#include "query_optimizer/expressions/ExpressionType.hpp"
    +#include "query_optimizer/expressions/Scalar.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/rules/Rule.hpp"
    +#include "utility/Macros.hpp"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +class OptimizerContext;
    +
    +/** \addtogroup OptimizerRules
    + *  @{
    + */
    +
    +/**
    + * @brief Rule that applies to a physical plan to identify and label common
    + *        subexpressions.
    + *
    + * @note This is essentially a logical optimization pass. However, we need some
    + *       of the physical passes (e.g. ReuseAggregateExpressions) to be finalized
    + *       before this one to maximize optimization opportunities.
    + */
    +class ExtractCommonSubexpression : public Rule<physical::Physical> {
    + public:
    +  /**
    +   * @brief Constructor.
    +   *
    +   * @param optimizer_context The optimizer context.
    +   */
    +  explicit ExtractCommonSubexpression(OptimizerContext *optimizer_context);
    +
    +  ~ExtractCommonSubexpression() override {}
    +
    +  std::string getName() const override {
    +    return "ExtractCommonSubexpression";
    +  }
    +
    +  physical::PhysicalPtr apply(const physical::PhysicalPtr &input) override;
    +
    + private:
    +  physical::PhysicalPtr applyInternal(const physical::PhysicalPtr &input);
    +
    +  struct ScalarHash {
    +    inline std::size_t operator()(const expressions::ScalarPtr &scalar) const {
    +      return scalar->hash();
    +    }
    +  };
    +
    +  struct ScalarEqual {
    +    inline bool operator()(const expressions::ScalarPtr &lhs,
    +                           const expressions::ScalarPtr &rhs) const {
    +      return lhs->equals(rhs);
    +    }
    +  };
    +
    +  // For memorizing whether a subexpression is hashable.
    +  using ScalarHashable = std::unordered_set<expressions::ScalarPtr>;
    +
    +  // For counting the number of occurrences of each subexpression.
    +  using ScalarCounter =
    +      std::unordered_map<expressions::ScalarPtr, std::size_t, ScalarHash, ScalarEqual>;
    +
    +  // For mapping each subexpression to its transformed version.
    +  using ScalarMap =
    +      std::unordered_map<expressions::ScalarPtr,
    +                         expressions::CommonSubexpressionPtr,
    +                         ScalarHash,
    +                         ScalarEqual>;
    +
    +  std::vector<expressions::ExpressionPtr> transformExpressions(
    +      const std::vector<expressions::ExpressionPtr> &expressions);
    +
    +  expressions::ExpressionPtr transformExpression(
    +      const expressions::ExpressionPtr &expression);
    +
    +  // Traverse the expression tree and increase the count of each subexpression.
    +  bool visitAndCount(
    +      const expressions::ExpressionPtr &expression,
    +      ScalarCounter *counter,
    +      ScalarHashable *hashable);
    +
    +  // Traverse the expression tree and transform subexpressions (to common
    +  // subexpressions) if applicable.
    +  expressions::ExpressionPtr visitAndTransform(
    +      const expressions::ExpressionPtr &expression,
    +      const std::size_t max_reference_count,
    +      const ScalarCounter &counter,
    +      const ScalarHashable &hashable,
    +      ScalarMap *substitution_map);
    +
    +  template <typename ScalarSubclassT>
    +  static std::vector<expressions::ExpressionPtr> UpCast(
    +      const std::vector<std::shared_ptr<const ScalarSubclassT>> &expressions) {
    +    std::vector<expressions::ExpressionPtr> output;
    +    for (const auto &expr : expressions) {
    +      output.emplace_back(expr);
    +    }
    +    return output;
    +  }
    +
    +  template <typename ScalarSubclassT>
    +  static std::vector<std::shared_ptr<const ScalarSubclassT>> DownCast(
    +      const std::vector<expressions::ExpressionPtr> &expressions) {
    +    std::vector<std::shared_ptr<const ScalarSubclassT>> output;
    +    for (const auto &expr : expressions) {
    +      output.emplace_back(std::static_pointer_cast<const ScalarSubclassT>(expr));
    +    }
    +    return output;
    +  }
    +
    +  struct ExpressionTypeHash {
    +    using UnderT = std::underlying_type<expressions::ExpressionType>::type;
    +
    +    inline std::size_t operator()(const expressions::ExpressionType &et) const {
    +      return std::hash<UnderT>()(static_cast<UnderT>(et));
    +    }
    +  };
    +
    +  // Here we define that an expression type is homogeneous if at execution time
    +  // the result column vector of that expression has a one-to-one positional
    +  // correspondence with all the result column vectors from its child expressions.
    +  // E.g. aggregate functions and CASE expressions are not homogeneous.
    +  //
    +  // Being homogeneous enables common subexpressions to be hoisted through.
    +  // For example, consider the case
    +  // --
    +  //   (x * 2) + F(x * 2)
    +  // --
    +  // where F is some unary expression. Then if F is homogenous, we can mark the
    +  // two (x * 2) as common subexpressions. Otherwise, we cannot do that since
    +  // the two subexpressions will generate different result column vectors.
    +  std::unordered_set<expressions::ExpressionType,
    +                     ExpressionTypeHash> homogeneous_expression_types_;
    +
    +  OptimizerContext *optimizer_context_;
    --- End diff --
    
    It seems that we do not modify this object, so I guess we could mark it `const`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112848329
  
    --- Diff: query_optimizer/rules/ReuseAggregateExpressions.cpp ---
    @@ -0,0 +1,349 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#include "query_optimizer/rules/ReuseAggregateExpressions.hpp"
    +
    +#include <cstddef>
    +#include <list>
    +#include <map>
    +#include <memory>
    +#include <unordered_map>
    +#include <vector>
    +
    +#include "expressions/aggregation/AggregateFunction.hpp"
    +#include "expressions/aggregation/AggregateFunctionFactory.hpp"
    +#include "expressions/aggregation/AggregationID.hpp"
    +#include "query_optimizer/OptimizerContext.hpp"
    +#include "query_optimizer/expressions/AggregateFunction.hpp"
    +#include "query_optimizer/expressions/Alias.hpp"
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/BinaryExpression.hpp"
    +#include "query_optimizer/expressions/ExpressionType.hpp"
    +#include "query_optimizer/expressions/ExpressionUtil.hpp"
    +#include "query_optimizer/expressions/NamedExpression.hpp"
    +#include "query_optimizer/expressions/Scalar.hpp"
    +#include "query_optimizer/physical/Aggregate.hpp"
    +#include "query_optimizer/physical/PatternMatcher.hpp"
    +#include "query_optimizer/physical/Physical.hpp"
    +#include "query_optimizer/physical/PhysicalType.hpp"
    +#include "query_optimizer/physical/Selection.hpp"
    +#include "query_optimizer/physical/TopLevelPlan.hpp"
    +#include "types/operations/binary_operations/BinaryOperation.hpp"
    +#include "types/operations/binary_operations/BinaryOperationFactory.hpp"
    +#include "types/operations/binary_operations/BinaryOperationID.hpp"
    +#include "utility/HashError.hpp"
    +
    +#include "gflags/gflags.h"
    +
    +#include "glog/logging.h"
    +
    +namespace quickstep {
    +namespace optimizer {
    +
    +DEFINE_uint64(reuse_aggregate_group_size_threshold, 1000u,
    +              "The threshold on estimated number of groups for an aggregation "
    +              "below which the ReuseAggregateExpressions optimization will be "
    +              "performed.");
    +
    +DEFINE_double(reuse_aggregate_ratio_threshold, 0.3,
    +              "The threshold on the ratio of (# of eliminable columns) to "
    +              "(# of original columns) for an aggregation above which the "
    +              "ReuseAggregateExpressions optimization will be performed.");
    +
    +namespace E = ::quickstep::optimizer::expressions;
    +namespace P = ::quickstep::optimizer::physical;
    +
    +void ReuseAggregateExpressions::init(const P::PhysicalPtr &input) {
    +  DCHECK(input->getPhysicalType() == P::PhysicalType::kTopLevelPlan);
    +
    +  // Initialize cost model.
    +  const P::TopLevelPlanPtr top_level_plan =
    +     std::static_pointer_cast<const P::TopLevelPlan>(input);
    +  cost_model_.reset(
    +      new cost::StarSchemaSimpleCostModel(top_level_plan->shared_subplans()));
    +}
    +
    +P::PhysicalPtr ReuseAggregateExpressions::applyToNode(
    +    const P::PhysicalPtr &input) {
    +  P::AggregatePtr aggregate;
    +  if (!P::SomeAggregate::MatchesWithConditionalCast(input, &aggregate)) {
    +    return input;
    +  }
    +
    +  const std::vector<E::AliasPtr> &agg_exprs = aggregate->aggregate_expressions();
    +
    +  // Maps aggregated expressions to AggregationID + positions.
    +  //
    +  // For example,
    +  // --
    +  //   SELECT SUM(x+1), AVG(x+1), SUM(x+1), COUNT(*), SUM(y) FROM s;
    +  // --
    +  // will generate *agg_expr_info* as
    +  // --
    +  // {
    +  //   x+1: { kSum: [0, 2], kAvg: [1] },
    +  //   y: { kSum: [4] },
    --- End diff --
    
    Minor: by conversion, an array / a vector is represented using `{ 0, 2 }`. So, I suggest to change to `x+1: { kSum: { 0, 2 }, kAvg: { 1 } },`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by jianqiao <gi...@git.apache.org>.
Github user jianqiao commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r113070938
  
    --- Diff: query_optimizer/expressions/Scalar.hpp ---
    @@ -65,10 +67,49 @@ class Scalar : public Expression {
           const std::unordered_map<ExprId, const CatalogAttribute*>& substitution_map)
           const = 0;
     
    +  /**
    +   * @brief Check whether this scalar is semantically equivalent to \p other.
    +   *
    +   * @note The fact that two scalars are semantically equal brings more
    +   *       optimization opportunities, e.g. common subexpression elimination.
    +   *       Meanwhile, it is always safe to assume that two scalars are not equal.
    +   *
    +   * @return True if this scalar equals \p other; false otherwise.
    +   */
    +  virtual bool equals(const ScalarPtr &other) const {
    +    return false;
    +  }
    +
    +  /**
    +   * @brief Get a hash of this scalar.
    +   *
    +   * @return A hash of this scalar.
    +   */
    +  std::size_t hash() const {
    +    if (hash_cache_ == nullptr) {
    +      hash_cache_ = std::make_unique<std::size_t>(computeHash());
    +    }
    +    return *hash_cache_;
    --- End diff --
    
    Also `std::hash` returns `std::size_t`, so it may be more consistent to use `std::size_t` as hash value type here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112827868
  
    --- Diff: expressions/scalar/ScalarSharedExpression.cpp ---
    @@ -0,0 +1,141 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#include "expressions/scalar/ScalarSharedExpression.hpp"
    +
    +#include <string>
    +#include <utility>
    +#include <vector>
    +
    +#include "catalog/CatalogTypedefs.hpp"
    +#include "expressions/Expressions.pb.h"
    +#include "storage/ValueAccessor.hpp"
    +#include "types/TypedValue.hpp"
    +#include "types/containers/ColumnVector.hpp"
    +#include "utility/ColumnVectorCache.hpp"
    +
    +namespace quickstep {
    +
    +struct SubBlocksReference;
    +
    +ScalarSharedExpression::ScalarSharedExpression(const int share_id,
    +                                               Scalar *operand)
    +    : Scalar(operand->getType()),
    +      share_id_(share_id),
    +      operand_(operand) {
    +}
    +
    +serialization::Scalar ScalarSharedExpression::getProto() const {
    +  serialization::Scalar proto;
    +  proto.set_data_source(serialization::Scalar::SHARED_EXPRESSION);
    +  proto.SetExtension(serialization::ScalarSharedExpression::share_id, share_id_);
    +  proto.MutableExtension(serialization::ScalarSharedExpression::operand)
    +      ->CopyFrom(operand_->getProto());
    --- End diff --
    
    Change to `MergeFrom` to avoid the unnecessary clean-up operation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r113071179
  
    --- Diff: query_optimizer/expressions/Scalar.hpp ---
    @@ -65,10 +67,49 @@ class Scalar : public Expression {
           const std::unordered_map<ExprId, const CatalogAttribute*>& substitution_map)
           const = 0;
     
    +  /**
    +   * @brief Check whether this scalar is semantically equivalent to \p other.
    +   *
    +   * @note The fact that two scalars are semantically equal brings more
    +   *       optimization opportunities, e.g. common subexpression elimination.
    +   *       Meanwhile, it is always safe to assume that two scalars are not equal.
    +   *
    +   * @return True if this scalar equals \p other; false otherwise.
    +   */
    +  virtual bool equals(const ScalarPtr &other) const {
    +    return false;
    +  }
    +
    +  /**
    +   * @brief Get a hash of this scalar.
    +   *
    +   * @return A hash of this scalar.
    +   */
    +  std::size_t hash() const {
    +    if (hash_cache_ == nullptr) {
    +      hash_cache_ = std::make_unique<std::size_t>(computeHash());
    +    }
    +    return *hash_cache_;
    --- End diff --
    
    This will be used by all `Scalar` expressions during the optimization phase, right?
    
    Consider that only `0` will be hashed to `0`, I think it is ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112847625
  
    --- Diff: query_optimizer/expressions/CommonSubexpression.hpp ---
    @@ -0,0 +1,141 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_QUERY_OPTIMIZER_EXPRESSIONS_COMMON_SUBEXPRESSION_HPP_
    +#define QUICKSTEP_QUERY_OPTIMIZER_EXPRESSIONS_COMMON_SUBEXPRESSION_HPP_
    +
    +#include <memory>
    +#include <string>
    +#include <unordered_map>
    +#include <vector>
    +
    +#include "query_optimizer/expressions/AttributeReference.hpp"
    +#include "query_optimizer/expressions/ExprId.hpp"
    +#include "query_optimizer/expressions/Expression.hpp"
    +#include "query_optimizer/expressions/ExpressionType.hpp"
    +#include "query_optimizer/expressions/Scalar.hpp"
    +#include "utility/Macros.hpp"
    +
    +#include "glog/logging.h"
    +
    +namespace quickstep {
    +
    +class Scalar;
    +class Type;
    +
    +namespace optimizer {
    +namespace expressions {
    +
    +/** \addtogroup OptimizerExpressions
    + *  @{
    + */
    +
    +class CommonSubexpression;
    +typedef std::shared_ptr<const CommonSubexpression> CommonSubexpressionPtr;
    +
    +/**
    + * @brief A wrapper expression that represents a common subexpression.
    + */
    +class CommonSubexpression : public Scalar {
    + public:
    +  ExpressionType getExpressionType() const override {
    +    return ExpressionType::kCommonSubexpression;
    +  }
    +
    +  std::string getName() const override {
    +    return "CommonSubexpression";
    +  }
    +
    +  bool isConstant() const override {
    +    return operand_->isConstant();
    +  }
    +
    +  /**
    +   * @return The unique ID for the equivalence class that this common subexpression
    +   *         belongs to.
    +   */
    +  inline ExprId common_subexpression_id() const {
    +    return common_subexpression_id_;
    +  }
    +
    +  /**
    +   * @return The operand that represents the underlying subexpression.
    +   */
    +  const ScalarPtr& operand() const {
    +    return operand_;
    +  }
    +
    +  const Type& getValueType() const override {
    +    return operand_->getValueType();
    +  }
    +
    +  ExpressionPtr copyWithNewChildren(
    +      const std::vector<ExpressionPtr> &new_children) const override;
    +
    +  std::vector<AttributeReferencePtr> getReferencedAttributes() const override {
    +    return operand_->getReferencedAttributes();
    +  }
    +
    +  ::quickstep::Scalar* concretize(
    +      const std::unordered_map<ExprId, const CatalogAttribute*> &substitution_map) const override;
    +
    +  /**
    +   * @brief Creates an immutable CommonSubexpression.
    +   *
    +   * @param common_subexpression_id A unique ID for the equivalence class that
    +   *        this common subexpressions belongs to.
    +   * @param operand The operand that represents the underlying subexpression.
    +   * @return An immutable CommonSubexpression.
    +   */
    +  static CommonSubexpressionPtr Create(const ExprId common_subexpression_id,
    +                                       const ScalarPtr &operand) {
    +    return CommonSubexpressionPtr(
    +        new CommonSubexpression(common_subexpression_id, operand));
    +  }
    +
    + protected:
    +  void getFieldStringItems(
    +      std::vector<std::string> *inline_field_names,
    +      std::vector<std::string> *inline_field_values,
    +      std::vector<std::string> *non_container_child_field_names,
    +      std::vector<OptimizerTreeBaseNodePtr> *non_container_child_fields,
    +      std::vector<std::string> *container_child_field_names,
    +      std::vector<std::vector<OptimizerTreeBaseNodePtr>> *container_child_fields) const override;
    +
    + private:
    +  CommonSubexpression(const ExprId common_subexpression_id,
    +                      const ScalarPtr &operand)
    +      : common_subexpression_id_(common_subexpression_id),
    +        operand_(operand) {
    +    addChild(operand);
    +  }
    +
    +  ExprId common_subexpression_id_;
    +  ScalarPtr operand_;
    --- End diff --
    
    Mark both `const`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112826967
  
    --- Diff: expressions/scalar/ScalarAttribute.cpp ---
    @@ -139,25 +142,26 @@ ColumnVector* ScalarAttribute::getAllValues(ValueAccessor *accessor,
               }
             }
           }
    -      return result;
    +      return ColumnVectorPtr(result);
    --- End diff --
    
    Is the conversion required? I think `result` will be implicitly wrapped to a shared pointer (aka `ColumnVectorPtr`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112827855
  
    --- Diff: expressions/scalar/ScalarSharedExpression.cpp ---
    @@ -0,0 +1,141 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#include "expressions/scalar/ScalarSharedExpression.hpp"
    +
    +#include <string>
    +#include <utility>
    +#include <vector>
    +
    +#include "catalog/CatalogTypedefs.hpp"
    +#include "expressions/Expressions.pb.h"
    +#include "storage/ValueAccessor.hpp"
    +#include "types/TypedValue.hpp"
    +#include "types/containers/ColumnVector.hpp"
    +#include "utility/ColumnVectorCache.hpp"
    +
    +namespace quickstep {
    +
    +struct SubBlocksReference;
    +
    +ScalarSharedExpression::ScalarSharedExpression(const int share_id,
    +                                               Scalar *operand)
    +    : Scalar(operand->getType()),
    +      share_id_(share_id),
    +      operand_(operand) {
    +}
    --- End diff --
    
    Suggest to move the constructor in the header file, as there are no expensive operations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112826924
  
    --- Diff: expressions/scalar/Scalar.hpp ---
    @@ -70,6 +75,11 @@ class Scalar {
       virtual ~Scalar() {
    --- End diff --
    
    Change to `~Scalar() override {`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112827764
  
    --- Diff: expressions/scalar/ScalarCaseExpression.cpp ---
    @@ -420,15 +429,15 @@ void ScalarCaseExpression::MultiplexNativeColumnVector(
     void ScalarCaseExpression::MultiplexIndirectColumnVector(
         const TupleIdSequence *source_sequence,
         const TupleIdSequence &case_matches,
    -    IndirectColumnVector *case_result,
    +    const IndirectColumnVector &case_result,
         IndirectColumnVector *output) {
       if (source_sequence == nullptr) {
         TupleIdSequence::const_iterator output_pos_it = case_matches.begin();
         for (std::size_t input_pos = 0;
    -         input_pos < case_result->size();
    +         input_pos < case_result.size();
              ++input_pos, ++output_pos_it) {
           output->positionalWriteTypedValue(*output_pos_it,
    -                                        case_result->moveTypedValue(input_pos));
    --- End diff --
    
    Remove unused `moveTypedValue` method in `types/containers/ColumnVector.hpp`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #237: QUICKSTEP-89 Add support for common s...

Posted by zuyu <gi...@git.apache.org>.
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/237#discussion_r112828010
  
    --- Diff: expressions/scalar/ScalarSharedExpression.hpp ---
    @@ -0,0 +1,127 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + **/
    +
    +#ifndef QUICKSTEP_EXPRESSIONS_SCALAR_SCALAR_SHARED_EXPRESSION_HPP_
    +#define QUICKSTEP_EXPRESSIONS_SCALAR_SCALAR_SHARED_EXPRESSION_HPP_
    +
    +#include <memory>
    +#include <string>
    +#include <utility>
    +#include <vector>
    +
    +#include "catalog/CatalogTypedefs.hpp"
    +#include "expressions/Expressions.pb.h"
    +#include "expressions/scalar/Scalar.hpp"
    +#include "storage/StorageBlockInfo.hpp"
    +#include "types/TypedValue.hpp"
    +#include "types/containers/ColumnVector.hpp"
    +#include "utility/Macros.hpp"
    +
    +namespace quickstep {
    +
    +class ColumnVectorCache;
    +class ValueAccessor;
    +
    +struct SubBlocksReference;
    +
    +/** \addtogroup Expressions
    + *  @{
    + */
    +
    +/**
    + * @brief Scalars that represent common subexpressions whose results are cached
    + *        and shared.
    + **/
    +class ScalarSharedExpression : public Scalar {
    + public:
    +  /**
    +   * @brief Constructor.
    +   *
    +   * @param share_id The unique integer identifier for each equivalence class of
    +   *        common subexpressions.
    +   * @param operand The underlying scalar subexpression.
    --- End diff --
    
    Add comments stating that it takes ownership of `operand`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---