You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@quickstep.apache.org by rathijit <gi...@git.apache.org> on 2016/08/10 22:04:57 UTC

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

GitHub user rathijit opened a pull request:

    https://github.com/apache/incubator-quickstep/pull/90

    Quickstep 28 29

    More efficient handling of aggregations by having a combined payload in the hash table. This also removes ValueT parametrization for the hash table data structure. Please refer to JIRA tickets QUICKSTEP-28 and QUICKSTEP-29 for more details.
    TPC-H query 01 sees significant performance improvements with these changes.
    
    Currently, only SeparateChaining hash table type is supported.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-quickstep quickstep-28-29

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-quickstep/pull/90.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #90
    
----
commit 39485d620402273890e92072e05f856fd9da7f9d
Author: rathijit <ra...@node-2.hashtable.quickstep-pg0.wisc.cloudlab.us>
Date:   2016-07-04T07:44:48Z

    Initial commit for QUICKSTEP-28 and QUICKSTEP-29. Code refactoring and cleanup, some more optimizations are pending.

commit d9c9e6867d3cbd6ed6209e944a2b7d91748e40d3
Author: rathijit <ra...@node-2.aggregation.quickstep-pg0.wisc.cloudlab.us>
Date:   2016-08-05T11:00:12Z

    Fixed 4 failures on unit tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r78227477
  
    --- Diff: storage/StorageBlock.cpp ---
    @@ -494,6 +494,91 @@ void StorageBlock::aggregateGroupBy(
                                                  hash_table);
     }
     
    +
    +void StorageBlock::aggregateGroupByFast(
    +    const std::vector<std::vector<std::unique_ptr<const Scalar>>> &arguments,
    +    const std::vector<std::unique_ptr<const Scalar>> &group_by,
    +    const Predicate *predicate,
    +    AggregationStateHashTableBase *hash_table,
    +    std::unique_ptr<TupleIdSequence> *reuse_matches,
    +    std::vector<std::unique_ptr<ColumnVector>> *reuse_group_by_vectors) const {
    +  DCHECK_GT(group_by.size(), 0u)
    +      << "Called aggregateGroupBy() with zero GROUP BY expressions";
    +
    +  SubBlocksReference sub_blocks_ref(*tuple_store_,
    +                                    indices_,
    +                                    indices_consistent_);
    +
    +  // IDs of 'arguments' as attributes in the ValueAccessor we create below.
    +  std::vector<attribute_id> arg_ids;
    +  std::vector<std::vector<attribute_id>> argument_ids;
    +
    +  // IDs of GROUP BY key element(s) in the ValueAccessor we create below.
    +  std::vector<attribute_id> key_ids;
    +
    +  // An intermediate ValueAccessor that stores the materialized 'arguments' for
    +  // this aggregate, as well as the GROUP BY expression values.
    +  ColumnVectorsValueAccessor temp_result;
    +  {
    +    std::unique_ptr<ValueAccessor> accessor;
    +    if (predicate) {
    +      if (!*reuse_matches) {
    +        // If there is a filter predicate that hasn't already been evaluated,
    +        // evaluate it now and save the results for other aggregates on this
    +        // same block.
    +        reuse_matches->reset(getMatchesForPredicate(predicate));
    +      }
    +
    +      // Create a filtered ValueAccessor that only iterates over predicate
    +      // matches.
    +      accessor.reset(tuple_store_->createValueAccessor(reuse_matches->get()));
    +    } else {
    +      // Create a ValueAccessor that iterates over all tuples in this block
    +      accessor.reset(tuple_store_->createValueAccessor());
    +    }
    +
    +    attribute_id attr_id = 0;
    +
    +    // First, put GROUP BY keys into 'temp_result'.
    +    if (reuse_group_by_vectors->empty()) {
    +      // Compute GROUP BY values from group_by Scalars, and store them in
    +      // reuse_group_by_vectors for reuse by other aggregates on this same
    +      // block.
    +      reuse_group_by_vectors->reserve(group_by.size());
    +      for (const std::unique_ptr<const Scalar> &group_by_element : group_by) {
    +        reuse_group_by_vectors->emplace_back(
    +            group_by_element->getAllValues(accessor.get(), &sub_blocks_ref));
    +        temp_result.addColumn(reuse_group_by_vectors->back().get(), false);
    +        key_ids.push_back(attr_id++);
    +      }
    +    } else {
    +      // Reuse precomputed GROUP BY values from reuse_group_by_vectors.
    +      DCHECK_EQ(group_by.size(), reuse_group_by_vectors->size())
    +          << "Wrong number of reuse_group_by_vectors";
    +      for (const std::unique_ptr<ColumnVector> &reuse_cv : *reuse_group_by_vectors) {
    +        temp_result.addColumn(reuse_cv.get(), false);
    +        key_ids.push_back(attr_id++);
    +      }
    +    }
    +
    +    // Compute argument vectors and add them to 'temp_result'.
    +    for (const std::vector<std::unique_ptr<const Scalar>> &argument : arguments) {
    +        arg_ids.clear();
    +        for (const std::unique_ptr<const Scalar> &args : argument) {
    +          temp_result.addColumn(args->getAllValues(accessor.get(), &sub_blocks_ref));
    +          arg_ids.push_back(attr_id++);
    +        }
    +        argument_ids.push_back(arg_ids);
    +     }
    +  }
    +
    +  hash_table->upsertValueAccessorCompositeKeyFast(argument_ids,
    --- End diff --
    
    Due to this call, we have to create a virtual function ``upsertValueAccessorCompositeKeyFast`` in the HashTableBase class, to avoid a circular dependency. An alternative is as follows:
    Make the aggregateGroupBy function return the value accessor and other information required for this call. The caller of aggregateGroupBy will perform the upsert operation on the hash table, thereby eliminating the need for this function call. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/90
  
    We can get rid of all the ``put*`` methods in the FastHashTable implementation, as they are not used elsewhere in the code. This would also simplify the concurrency related concerns related to the FastHashTable. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74787900
  
    --- Diff: expressions/aggregation/AggregationConcreteHandle.hpp ---
    @@ -302,6 +191,12 @@ class HashTableAggregateFinalizer {
         output_column_vector_->appendTypedValue(handle_.finalizeHashTableEntry(group_state));
       }
     
    +  inline void operator()(const std::vector<TypedValue> &group_by_key,
    --- End diff --
    
    Can you please add doxygen for this functor? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74795227
  
    --- Diff: expressions/aggregation/AggregationHandleCount.hpp ---
    @@ -62,6 +63,10 @@ class AggregationStateCount : public AggregationState {
        */
       ~AggregationStateCount() override {}
     
    +  size_t getPayloadSize() const {
    --- End diff --
    
    It is considered a good practice to use the ``std`` prefix in the header file. Therefore, can you please change the ``size_t`` to ``std::size_t``?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r78227666
  
    --- Diff: storage/StorageBlock.hpp ---
    @@ -468,6 +468,14 @@ class StorageBlock : public StorageBlockBase {
                             std::vector<std::unique_ptr<ColumnVector>>
                                 *reuse_group_by_vectors) const;
     
    +
    +  void aggregateGroupByFast(const std::vector<std::vector<std::unique_ptr<const Scalar>>> &arguments,
    --- End diff --
    
    Remove the older ``aggregateGroupBy`` function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/90
  
    Hi @rathijit 
    
    Thanks for summarizing the review discussion. The current travis failure is only for GCC and looks like it doesn't like signed and unsigned comparison (Clang is fine with it). 
    
    Is there a way we can address the second point in the future work - Stop calling finalize for non-avg aggregates in the same PR? If each aggregate handle can tell its type, we can detect it much earlier and not even call the finalize hash table call on that handle. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74787536
  
    --- Diff: expressions/aggregation/AggregationConcreteHandle.hpp ---
    @@ -27,10 +27,12 @@
     #include "catalog/CatalogTypedefs.hpp"
     #include "expressions/aggregation/AggregationHandle.hpp"
     #include "storage/HashTable.hpp"
    +#include "storage/FastHashTable.hpp"
    --- End diff --
    
    As a convention, we sort the included files in alphabetical order. If you use vim, there's an easy way to do that. You can select all the #include statements in visual mode and type sort. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r78233635
  
    --- Diff: expressions/aggregation/AggregationHandleAvg.hpp ---
    @@ -109,8 +129,41 @@ class AggregationHandleAvg : public AggregationConcreteHandle {
         ++state->count_;
       }
     
    +  inline void iterateUnaryInlFast(const TypedValue &value,
    +                                  std::uint8_t *byte_ptr) const {
    +    DCHECK(value.isPlausibleInstanceOf(argument_type_.getSignature()));
    +    if (value.isNull()) return;
    +    TypedValue *sum_ptr =
    +        reinterpret_cast<TypedValue *>(byte_ptr + blank_state_.sum_offset_);
    +    std::int64_t *count_ptr =
    +        reinterpret_cast<std::int64_t *>(byte_ptr + blank_state_.count_offset_);
    +    *sum_ptr = fast_add_operator_->applyToTypedValues(*sum_ptr, value);
    +    ++(*count_ptr);
    +  }
    +
    +  inline void updateState(const std::vector<TypedValue> &arguments,
    --- End diff --
    
    We only need one argument here, instead of a vector. As a further optimization, we can use an Untyped value (i.e. a pointer) instead of a TypedValue. As the fast_add_operator_ in this class is already an UncheckedOperator, it can work with untyped values. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r78224619
  
    --- Diff: storage/FastHashTableFactory.hpp ---
    @@ -0,0 +1,300 @@
    +/**
    + *   Copyright 2015-2016 Pivotal Software, Inc.
    + *
    + *   Licensed under the Apache License, Version 2.0 (the "License");
    + *   you may not use this file except in compliance with the License.
    + *   You may obtain a copy of the License at
    + *
    + *       http://www.apache.org/licenses/LICENSE-2.0
    + *
    + *   Unless required by applicable law or agreed to in writing, software
    + *   distributed under the License is distributed on an "AS IS" BASIS,
    + *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + *   See the License for the specific language governing permissions and
    + *   limitations under the License.
    + **/
    +
    +#ifndef QUICKSTEP_STORAGE_FAST_HASH_TABLE_FACTORY_HPP_
    +#define QUICKSTEP_STORAGE_FAST_HASH_TABLE_FACTORY_HPP_
    +
    +#include <cstddef>
    +#include <string>
    +#include <vector>
    +
    +#include "storage/HashTable.hpp"
    +#include "storage/FastHashTable.hpp"
    +#include "storage/HashTableBase.hpp"
    +#include "storage/HashTableFactory.hpp"
    +#include "storage/HashTable.pb.h"
    +#include "storage/LinearOpenAddressingHashTable.hpp"
    +#include "storage/SeparateChainingHashTable.hpp"
    +#include "storage/FastSeparateChainingHashTable.hpp"
    +#include "storage/SimpleScalarSeparateChainingHashTable.hpp"
    +#include "storage/TupleReference.hpp"
    +#include "types/TypeFactory.hpp"
    +#include "utility/BloomFilter.hpp"
    +#include "utility/Macros.hpp"
    +
    +#include "glog/logging.h"
    +
    +namespace quickstep {
    +
    +class StorageManager;
    +class Type;
    +
    +/** \addtogroup Storage
    + *  @{
    + */
    +
    +/**
    + * @brief Templated all-static factory class that makes it easier to
    + *        instantiate HashTables with the particular HashTable implementation
    + *        chosen at runtime. All template parameters are exactly the same as
    + *        those of HashTable.
    + **/
    +template <bool resizable,
    +          bool serializable,
    +          bool force_key_copy,
    +          bool allow_duplicate_keys>
    +class FastHashTableFactory {
    + public:
    +  /**
    +   * @brief Create a new resizable HashTable, with the type selected by
    +   *        hash_table_type. Other parameters are forwarded to the HashTable's
    +   *        constructor.
    +   *
    +   * @param hash_table_type The specific HashTable implementation that should
    +   *        be used.
    +   * @param key_types A vector of one or more types (>1 indicates a composite
    +   *        key). Forwarded as-is to the HashTable's constructor.
    +   * @param num_entries The estimated number of entries the HashTable will
    +   *        hold. Forwarded as-is to the HashTable's constructor.
    +   * @param storage_manager The StorageManager to use (a StorageBlob will be
    +   *        allocated to hold the HashTable's contents). Forwarded as-is to the
    +   *        HashTable's constructor.
    +   * @return A new resizable HashTable.
    +   **/
    +  static FastHashTable<resizable, serializable, force_key_copy, allow_duplicate_keys>*
    +      CreateResizable(const HashTableImplType hash_table_type,
    +                      const std::vector<const Type*> &key_types,
    +                      const std::size_t num_entries,
    +                      const std::vector<std::size_t> &payload_sizes,
    +                      const std::vector<AggregationHandle *> &handles,
    +                      StorageManager *storage_manager) {
    +    DCHECK(resizable);
    +
    +    switch (hash_table_type) {
    +      case HashTableImplType::kSeparateChaining:
    +        return new FastSeparateChainingHashTable<
    +            resizable,
    +            serializable,
    +            force_key_copy,
    +            allow_duplicate_keys>(key_types, num_entries, payload_sizes, handles, storage_manager);
    +      case HashTableImplType::kLinearOpenAddressing:
    --- End diff --
    
    No need for this case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r77855214
  
    --- Diff: expressions/aggregation/AggregationConcreteHandle.cpp ---
    @@ -51,22 +52,24 @@ void AggregationConcreteHandle::insertValueAccessorIntoDistinctifyHashTable(
         AggregationStateHashTableBase *distinctify_hash_table) const {
       // If the key-value pair is already there, we don't need to update the value,
       // which should always be "true". I.e. the value is just a placeholder.
    -  const auto noop_upserter = [](const auto &accessor, const bool *value) -> void {};
    +  //  const auto noop_upserter = [](const auto &accessor, const bool *value) -> void {};
    --- End diff --
    
    Remove the dead code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-quickstep/pull/90


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74795338
  
    --- Diff: expressions/aggregation/AggregationHandleCount.hpp ---
    @@ -117,6 +127,34 @@ class AggregationHandleCount : public AggregationConcreteHandle {
         }
       }
     
    +  inline void iterateUnaryInlFast(const TypedValue &value, uint8_t *byte_ptr) const {
    +    if ((!nullable_type) || (!value.isNull())) {
    +      std::int64_t *count_ptr = reinterpret_cast<std::int64_t *>(byte_ptr);
    +      (*count_ptr)++;
    +    }
    +  }
    +
    +  inline void iterateInlFast(const std::vector<TypedValue> &arguments, uint8_t *byte_ptr) const override {
    +     if (block_update) return;
    +     if (arguments.size())
    --- End diff --
    
    As per the style convention, can you please add braces for the if-else blocks? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/90
  
    Replace the argument_ids argument to upsertValueAccessor* methods with a single vector. If there is no argument for a given handle, use an invalid (e.g. -1) value. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74795909
  
    --- Diff: storage/HashTableBase.hpp ---
    @@ -63,7 +75,11 @@ class HashTableBase {
      public:
       virtual ~HashTableBase() {
       }
    -
    +  virtual bool upsertValueAccessorCompositeKeyFast(
    --- End diff --
    
    Need a newline before this function. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74786349
  
    --- Diff: expressions/aggregation/AggregationHandle.hpp ---
    @@ -347,20 +348,15 @@ class AggregationHandle {
        */
       virtual void aggregateOnDistinctifyHashTableForGroupBy(
           const AggregationStateHashTableBase &distinctify_hash_table,
    -      AggregationStateHashTableBase *aggregation_hash_table) const = 0;
    +      AggregationStateHashTableBase *aggregation_hash_table,
    +      int index) const = 0;
    --- End diff --
    
    Can you please update the doxygen for the ``index`` parameter? Also, if the index is always positive, can you change its type from int to std::size_t?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #90: Quickstep 28 29

Posted by rathijit <gi...@git.apache.org>.
Github user rathijit commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/90
  
    Summary of review comments with @hbdeshmukh  on 08/12/2016.
    
    ## TODO
    * Fix cyclic dependencies
    * Pass bool to SpinMutex constructor instead of byte pointer
    * Add comment in code to discuss removal of align restrictions in hash table payload construction
    * Overall, add/update comments for doxygen
    
    ## DISCUSSIONS
    * Is a lock needed for the hash table payload? Probably yes.
    
    ## POTENTIAL FUTURE WORK
    * Combine aggregation handling in finalize
    * Stop calling finalize for non-avg aggregates


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74796117
  
    --- Diff: storage/StorageBlock.hpp ---
    @@ -466,6 +466,14 @@ class StorageBlock : public StorageBlockBase {
                             std::vector<std::unique_ptr<ColumnVector>>
                                 *reuse_group_by_vectors) const;
     
    +
    +  void aggregateGroupByFast(const std::vector<std::vector<std::unique_ptr<const Scalar>>> &arguments,
    +                        const std::vector<std::unique_ptr<const Scalar>> &group_by,
    +                        const Predicate *predicate,
    +                        AggregationStateHashTableBase *hash_table,
    +                        std::unique_ptr<TupleIdSequence> *reuse_matches,
    +                        std::vector<std::unique_ptr<ColumnVector>>
    +                            *reuse_group_by_vectors) const;
    --- End diff --
    
    Can you please align the parameters and add doxygen comments? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/90
  
    Instead of using a vector of TypedValue ``local`` in the functions ``upsertValueAccessor*`` in FastHashTable, use a single TypedValue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74788256
  
    --- Diff: expressions/aggregation/AggregationHandleAvg.hpp ---
    @@ -57,26 +58,39 @@ class AggregationStateAvg : public AggregationState {
        */
       AggregationStateAvg(const AggregationStateAvg &orig)
           : sum_(orig.sum_),
    -        count_(orig.count_) {
    +        count_(orig.count_),
    +        sum_offset(orig.sum_offset),
    +        count_offset(orig.count_offset),
    +        mutex_offset(orig.mutex_offset) {
       }
     
       /**
        * @brief Destructor.
        */
       ~AggregationStateAvg() override {}
     
    +  size_t getPayloadSize() const {
    +     size_t p1 = reinterpret_cast<size_t>(&sum_);
    +     size_t p2 = reinterpret_cast<size_t>(&mutex_);
    +     return (p2-p1);
    +  }
    +
      private:
       friend class AggregationHandleAvg;
     
       AggregationStateAvg()
    -      : sum_(0), count_(0) {
    +      : sum_(0), count_(0), sum_offset(0),
    +        count_offset(reinterpret_cast<uint8_t *>(&count_)-reinterpret_cast<uint8_t *>(&sum_)),
    +        mutex_offset(reinterpret_cast<uint8_t *>(&mutex_)-reinterpret_cast<uint8_t *>(&sum_)) {
       }
     
       // TODO(shoban): We might want to specialize sum_ and count_ to use atomics
       // for int types similar to in AggregationStateCount.
       TypedValue sum_;
       std::int64_t count_;
       SpinMutex mutex_;
    +
    +  int sum_offset, count_offset, mutex_offset;
    --- End diff --
    
    As a convention, the class member variables have an underscore prefix, e.g. ``count_`` and the local function variables don't have the underscore prefix e.g. ``count``. This helps in a better readability. Can you please make the appropriate changes? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/90
  
    In the FastHashTable::upsertValue* methods, why do we need a vector for attribute_ids?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74796480
  
    --- Diff: expressions/aggregation/CMakeLists.txt ---
    @@ -271,45 +278,45 @@ target_link_libraries(quickstep_expressions_aggregation
     # Tests:
     
     # Unified executable to ammortize cost of linking.
    -add_executable(AggregationHandle_tests
    -               "${CMAKE_CURRENT_SOURCE_DIR}/tests/AggregationHandleAvg_unittest.cpp"
    -               "${CMAKE_CURRENT_SOURCE_DIR}/tests/AggregationHandleCount_unittest.cpp"
    -               "${CMAKE_CURRENT_SOURCE_DIR}/tests/AggregationHandleMax_unittest.cpp"
    -               "${CMAKE_CURRENT_SOURCE_DIR}/tests/AggregationHandleMin_unittest.cpp"
    -               "${CMAKE_CURRENT_SOURCE_DIR}/tests/AggregationHandleSum_unittest.cpp")
    -target_link_libraries(AggregationHandle_tests
    -                      gtest
    -                      gtest_main
    -                      quickstep_catalog_CatalogTypedefs
    -                      quickstep_expressions_aggregation_AggregateFunction
    -                      quickstep_expressions_aggregation_AggregateFunctionFactory
    -                      quickstep_expressions_aggregation_AggregationHandle
    -                      quickstep_expressions_aggregation_AggregationHandleAvg
    -                      quickstep_expressions_aggregation_AggregationHandleCount
    -                      quickstep_expressions_aggregation_AggregationHandleMax
    -                      quickstep_expressions_aggregation_AggregationHandleMin
    -                      quickstep_expressions_aggregation_AggregationHandleSum
    -                      quickstep_expressions_aggregation_AggregationID
    -                      quickstep_storage_HashTableBase
    -                      quickstep_storage_StorageManager
    -                      quickstep_types_CharType
    -                      quickstep_types_DateOperatorOverloads
    -                      quickstep_types_DatetimeIntervalType
    -                      quickstep_types_DatetimeType
    -                      quickstep_types_DoubleType
    -                      quickstep_types_FloatType
    -                      quickstep_types_IntType
    -                      quickstep_types_IntervalLit
    -                      quickstep_types_LongType
    -                      quickstep_types_Type
    -                      quickstep_types_TypeFactory
    -                      quickstep_types_TypeID
    -                      quickstep_types_TypedValue
    -                      quickstep_types_VarCharType
    -                      quickstep_types_YearMonthIntervalType
    -                      quickstep_types_containers_ColumnVector
    -                      quickstep_types_containers_ColumnVectorsValueAccessor
    -                      quickstep_types_operations_comparisons_Comparison
    -                      quickstep_types_operations_comparisons_ComparisonFactory
    -                      quickstep_types_operations_comparisons_ComparisonID)
    -add_test(AggregationHandle_tests AggregationHandle_tests)
    +# add_executable(AggregationHandle_tests
    +#               "${CMAKE_CURRENT_SOURCE_DIR}/tests/AggregationHandleAvg_unittest.cpp"
    +#               "${CMAKE_CURRENT_SOURCE_DIR}/tests/AggregationHandleCount_unittest.cpp"
    +#               "${CMAKE_CURRENT_SOURCE_DIR}/tests/AggregationHandleMax_unittest.cpp"
    +#               "${CMAKE_CURRENT_SOURCE_DIR}/tests/AggregationHandleMin_unittest.cpp"
    +#               "${CMAKE_CURRENT_SOURCE_DIR}/tests/AggregationHandleSum_unittest.cpp")
    +# target_link_libraries(AggregationHandle_tests
    +#                      gtest
    +#                      gtest_main
    +#                      quickstep_catalog_CatalogTypedefs
    +#                      quickstep_expressions_aggregation_AggregateFunction
    +#                      quickstep_expressions_aggregation_AggregateFunctionFactory
    +#                      quickstep_expressions_aggregation_AggregationHandle
    +#                      quickstep_expressions_aggregation_AggregationHandleAvg
    +#                      quickstep_expressions_aggregation_AggregationHandleCount
    +#                      quickstep_expressions_aggregation_AggregationHandleMax
    +#                      quickstep_expressions_aggregation_AggregationHandleMin
    +#                      quickstep_expressions_aggregation_AggregationHandleSum
    +#                      quickstep_expressions_aggregation_AggregationID
    +#                      quickstep_storage_HashTableBase
    +#                      quickstep_storage_StorageManager
    +#                      quickstep_types_CharType
    +#                      quickstep_types_DateOperatorOverloads
    +#                      quickstep_types_DatetimeIntervalType
    +#                      quickstep_types_DatetimeType
    +#                      quickstep_types_DoubleType
    +#                      quickstep_types_FloatType
    +#                      quickstep_types_IntType
    +#                      quickstep_types_IntervalLit
    +#                      quickstep_types_LongType
    +#                      quickstep_types_Type
    +#                      quickstep_types_TypeFactory
    +#                      quickstep_types_TypeID
    +#                      quickstep_types_TypedValue
    +#                      quickstep_types_VarCharType
    +#                      quickstep_types_YearMonthIntervalType
    +#                      quickstep_types_containers_ColumnVector
    +#                      quickstep_types_containers_ColumnVectorsValueAccessor
    +#                      quickstep_types_operations_comparisons_Comparison
    +#                      quickstep_types_operations_comparisons_ComparisonFactory
    +#                      quickstep_types_operations_comparisons_ComparisonID)
    +#add_test(AggregationHandle_tests AggregationHandle_tests)
    --- End diff --
    
    Can you elaborate on the changes related to tests? Are these tests performed somewhere else? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74795095
  
    --- Diff: expressions/aggregation/AggregationHandleAvg.hpp ---
    @@ -156,11 +217,12 @@ class AggregationHandleAvg : public AggregationConcreteHandle {
        */
       void aggregateOnDistinctifyHashTableForGroupBy(
           const AggregationStateHashTableBase &distinctify_hash_table,
    -      AggregationStateHashTableBase *aggregation_hash_table) const override;
    +      AggregationStateHashTableBase *aggregation_hash_table,
    +      int index) const override;
     
    -  void mergeGroupByHashTables(
    -      const AggregationStateHashTableBase &source_hash_table,
    -      AggregationStateHashTableBase *destination_hash_table) const override;
    +  size_t getPayloadSize() const override {
    --- End diff --
    
    It is considered a good practice to use the ``std`` prefix in the header file. Therefore, can you please change the ``size_t`` to ``std::size_t``?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74788446
  
    --- Diff: expressions/aggregation/AggregationHandleAvg.hpp ---
    @@ -109,6 +123,35 @@ class AggregationHandleAvg : public AggregationConcreteHandle {
         ++state->count_;
       }
     
    +  inline void iterateUnaryInlFast(const TypedValue &value, uint8_t *byte_ptr) const {
    +    DCHECK(value.isPlausibleInstanceOf(argument_type_.getSignature()));
    +    if (value.isNull()) return;
    +    TypedValue *sum_ptr = reinterpret_cast<TypedValue *>(byte_ptr + blank_state_.sum_offset);
    +    std::int64_t *count_ptr = reinterpret_cast<std::int64_t *>(byte_ptr + blank_state_.count_offset);
    +    *sum_ptr = fast_add_operator_->applyToTypedValues(*sum_ptr, value);
    +    ++(*count_ptr);
    +  }
    +
    +  inline void iterateInlFast(const std::vector<TypedValue> &arguments, uint8_t *byte_ptr) const override {
    +     if (block_update) return;
    --- End diff --
    
    Could be refactored to 
    ```
    if (!block_update) {
      iterateUnaryInlFast()
    
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep issue #90: Quickstep 28 29

Posted by rathijit <gi...@git.apache.org>.
Github user rathijit commented on the issue:

    https://github.com/apache/incubator-quickstep/pull/90
  
    Hi @hbdeshmukh 
    
    Stopping the finalize needs some thought as finalize currently also copies values (TypedValue) to the result ColumnVector. With the new hash table, the full/combined payload could be copied after only avg. (if it exists) does its finalize. Extracting TypedValues from the payload without calling the aggregate methods can be difficult. It would perhaps also need a three-step finalize (distinct, avg., copy).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74795434
  
    --- Diff: expressions/aggregation/AggregationHandleMax.cpp ---
    @@ -109,43 +99,46 @@ void AggregationHandleMax::mergeStates(
       }
     }
     
    +void AggregationHandleMax::mergeStatesFast(
    +    const std::uint8_t *source,
    +    std::uint8_t *destination) const {
    +    const TypedValue *src_max_ptr = reinterpret_cast<const TypedValue *>(source);
    +    TypedValue *dst_max_ptr = reinterpret_cast<TypedValue *>(destination);
    +    if (!(src_max_ptr->isNull())) {
    +      compareAndUpdateFast(dst_max_ptr, *src_max_ptr);
    +  }
    --- End diff --
    
    Can you please align the indentation? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74788908
  
    --- Diff: expressions/aggregation/AggregationHandleAvg.hpp ---
    @@ -139,9 +185,24 @@ class AggregationHandleAvg : public AggregationConcreteHandle {
                                                     TypedValue(static_cast<double>(agg_state.count_)));
       }
     
    +  inline TypedValue finalizeHashTableEntryFast(const uint8_t *byte_ptr) const {
    +//    const AggregationStateAvg &agg_state = static_cast<const AggregationStateAvg&>(state);
    --- End diff --
    
    Can you please delete the dead code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74788824
  
    --- Diff: expressions/aggregation/AggregationHandleAvg.hpp ---
    @@ -127,6 +170,9 @@ class AggregationHandleAvg : public AggregationConcreteHandle {
       void mergeStates(const AggregationState &source,
                        AggregationState *destination) const override;
     
    +  void mergeStatesFast(const uint8_t *source,
    --- End diff --
    
    Can you please align the parameters? I use a vim plugin called clang-format which helps greatly in fixing all the style issues with minimal efforts. If you need help in configuring that plugin, please let me know. It's a big time saver. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74785810
  
    --- Diff: expressions/aggregation/AggregationHandle.hpp ---
    @@ -265,7 +265,8 @@ class AggregationHandle {
        **/
       virtual ColumnVector* finalizeHashTable(
           const AggregationStateHashTableBase &hash_table,
    -      std::vector<std::vector<TypedValue>> *group_by_keys) const = 0;
    +      std::vector<std::vector<TypedValue>> *group_by_keys,
    +      int index) const = 0;
    --- End diff --
    
    Can you please update the doxygen for the ``index`` parameter? Also, if the index is always positive, can you change its type from int to ``std::size_t``? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74787702
  
    --- Diff: expressions/aggregation/AggregationConcreteHandle.hpp ---
    @@ -27,10 +27,12 @@
     #include "catalog/CatalogTypedefs.hpp"
     #include "expressions/aggregation/AggregationHandle.hpp"
     #include "storage/HashTable.hpp"
    +#include "storage/FastHashTable.hpp"
    --- End diff --
    
    A minor clarification, all the #include statements in the current block. You can leave out the #include glog block below as is. The #include block below is for third party includes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-quickstep pull request #90: Quickstep 28 29

Posted by hbdeshmukh <gi...@git.apache.org>.
Github user hbdeshmukh commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/90#discussion_r74786607
  
    --- Diff: expressions/aggregation/AggregationHandle.hpp ---
    @@ -347,20 +348,15 @@ class AggregationHandle {
        */
       virtual void aggregateOnDistinctifyHashTableForGroupBy(
           const AggregationStateHashTableBase &distinctify_hash_table,
    -      AggregationStateHashTableBase *aggregation_hash_table) const = 0;
    +      AggregationStateHashTableBase *aggregation_hash_table,
    +      int index) const = 0;
     
    -  /**
    -   * @brief Merge two GROUP BY hash tables in one.
    -   *
    -   * @note Both the hash tables should have the same structure.
    -   *
    -   * @param source_hash_table The hash table which will get merged.
    -   * @param destination_hash_table The hash table to which we will merge the
    -   *        other hash table.
    -   **/
    -  virtual void mergeGroupByHashTables(
    -      const AggregationStateHashTableBase &source_hash_table,
    -      AggregationStateHashTableBase *destination_hash_table) const = 0;
    +  virtual size_t getPayloadSize() const {return 1;}
    +  virtual void iterateInlFast(const std::vector<TypedValue> &arguments, uint8_t *byte_ptr) const {}
    +  virtual void mergeStatesFast(const uint8_t *src, uint8_t *dst) const {}
    +  virtual void initPayload(uint8_t *byte_ptr) const {}
    +  virtual void BlockUpdate() {}
    +  virtual void AllowUpdate() {}
    --- End diff --
    
    Can you please add doxygen for these methods? Also, as a convention, non-static method names start with a lowercase letter - so can you please rename the methods to ``blockUpdate`` and ``allowUpdate``?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---