You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/06/13 03:14:51 UTC

[GitHub] [incubator-doris] zenoyang opened a new pull request, #10084: [feature-wip](vectorized) Support block aggregate in scanner

zenoyang opened a new pull request, #10084:
URL: https://github.com/apache/incubator-doris/pull/10084

   # Proposed changes
   
   Issue Number: close #10083 
   In scanner thread, the same key of adjacent rows is aggregated, which reduces data transmission and improves the parallelism of aggregation.
   
   Note: The current ScanNode does not have column pruning, and the predicate column is included when comparing the key column, resulting in a poor aggregation effect. After waiting for column clipping support, optimize this problem.
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] spaces-X commented on a diff in pull request #10084: [feature-wip](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
spaces-X commented on code in PR #10084:
URL: https://github.com/apache/doris/pull/10084#discussion_r911079204


##########
be/src/vec/olap/block_reader.cpp:
##########
@@ -154,6 +177,49 @@ Status BlockReader::init(const ReaderParams& read_params) {
     return Status::OK();
 }
 
+Status BlockReader::next_block_with_aggregation(Block* block, MemPool* mem_pool,
+                                                ObjectPool* agg_pool, bool* eof) {
+    return _block_aggregator ? _agg_next_block(block, mem_pool, agg_pool, eof)
+                             : (this->*_next_block_func)(block, mem_pool, agg_pool, eof);
+}
+
+Status BlockReader::_agg_next_block(Block* block, MemPool* mem_pool, ObjectPool* agg_pool,
+                                    bool* eof) {
+    Status status;
+    while (true) {
+        if (_block_aggregator->source_exhausted()) {
+            status = (this->*_next_block_func)(block, mem_pool, agg_pool, eof);
+            if (UNLIKELY(!status.ok())) {
+                return status;
+            }
+            if (UNLIKELY(status.precise_code() == OLAP_ERR_DATA_EOF || block->rows() == 0)) {
+                break;
+            }
+
+            _block_aggregator->update_source(block);

Review Comment:
   nit: is it necessary to sort the block when the `group by` column is not at the very beginning in table schema?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10084: [feature](vectorized) Support block aggregate in scanner

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #10084:
URL: https://github.com/apache/doris/pull/10084#issuecomment-1520976931

   We're closing this PR because it hasn't been updated in a while.
   This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and feel free a maintainer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zenoyang commented on a diff in pull request #10084: [feature](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
zenoyang commented on code in PR #10084:
URL: https://github.com/apache/doris/pull/10084#discussion_r969622110


##########
be/src/vec/olap/block_aggregator.cpp:
##########
@@ -0,0 +1,234 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "vec/olap/block_aggregator.h"
+
+#include "util/simd/bits.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/columns_number.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type_factory.hpp"
+
+namespace doris::vectorized {
+
+template <typename ColumnType>
+inline void compare_previous(const IColumn& col, uint8_t* flags) {
+    if (col.is_nullable()) {
+        auto col_ptr = reinterpret_cast<const ColumnNullable*>(&col);
+        for (int i = 1; i < col_ptr->size(); ++i) {
+            flags[i] &= (col_ptr->compare_at(i, i - 1, col, -1) == 0);
+        }
+    } else {
+        auto col_ptr = reinterpret_cast<const ColumnType*>(&col);
+        for (int i = 1; i < col_ptr->size(); ++i) {
+            flags[i] &= (col_ptr->compare_at(i, i - 1, col, -1) == 0);
+        }
+    }
+}
+
+BlockAggregator::BlockAggregator(const TabletSchema* tablet_schema,
+                                 const std::vector<uint32_t>& output_columns,
+                                 const std::vector<uint32_t>& return_columns,
+                                 const std::unordered_set<uint32_t>* null_set, const int batch_size)
+        : _tablet_schema(tablet_schema),
+          _output_columns(output_columns),
+          _sorted_output_columns(output_columns),
+          _return_columns(return_columns),
+          _null_set(null_set),
+          _batch_size(batch_size) {
+    _has_agg_data = false;
+    _do_aggregate = true;
+    _agg_ratio = config::block_aggregate_ratio;
+    _num_columns = _output_columns.size();
+    _num_key_columns = 0;
+
+    // todo(zeno) Make sure column prune done
+    std::sort(_sorted_output_columns.begin(), _sorted_output_columns.end());
+    _output_columns_loc.resize(_num_columns);
+    for (int i = 0; i < _num_columns; ++i) {
+        for (int j = 0; j < _num_columns; ++j) {
+            if (_sorted_output_columns[i] == output_columns[j]) {
+                _output_columns_loc[i] = j;
+                break;
+            }
+        }
+    }
+
+    for (auto index : _sorted_output_columns) {
+        if (_tablet_schema->column(index).is_key()) {
+            _num_key_columns++;
+        }
+    }
+
+    _current_row = 0;
+    _source_size = 0;
+    _eq_previous.reserve(_batch_size);
+    _agg_index.reserve(_batch_size);
+    _agg_range.reserve(_batch_size);
+
+    // init _key_comparator
+    for (int i = 0; i < _num_key_columns; ++i) {
+        _key_comparator.emplace_back(
+                _get_comparator(_tablet_schema->column(_sorted_output_columns[i])));
+    }
+
+    // init _column_aggregator
+    for (int i = 0; i < _num_key_columns; ++i) {
+        _column_aggregator.emplace_back(std::make_unique<KeyColumnAggregator>());
+    }
+
+    for (int i = _num_key_columns; i < _num_columns; ++i) {
+        bool is_nullable = (_null_set != nullptr &&
+                            _null_set->find(_sorted_output_columns[i]) != _null_set->end());
+        auto col = _tablet_schema->column(_sorted_output_columns[i]);
+        auto data_type = vectorized::DataTypeFactory::instance().create_data_type(col, is_nullable);
+        _column_aggregator.emplace_back(std::make_unique<ValueColumnAggregator>(col, data_type));
+    }
+
+    aggregate_reset();
+}
+
+CompareFunc BlockAggregator::_get_comparator(const TabletColumn& col) {
+    switch (col.type()) {
+    case OLAP_FIELD_TYPE_TINYINT:
+        return &compare_previous<ColumnInt8>;
+    case OLAP_FIELD_TYPE_SMALLINT:
+        return &compare_previous<ColumnInt16>;
+    case OLAP_FIELD_TYPE_INT:
+        return &compare_previous<ColumnInt32>;
+    case OLAP_FIELD_TYPE_BIGINT:
+        return &compare_previous<ColumnInt64>;
+    case OLAP_FIELD_TYPE_LARGEINT:
+        return &compare_previous<ColumnInt128>;
+    case OLAP_FIELD_TYPE_BOOL:
+        return &compare_previous<ColumnUInt8>;
+    case OLAP_FIELD_TYPE_FLOAT:
+        return &compare_previous<ColumnFloat32>;
+    case OLAP_FIELD_TYPE_DOUBLE:
+        return &compare_previous<ColumnFloat64>;
+    case OLAP_FIELD_TYPE_DECIMAL:
+        return &compare_previous<ColumnDecimal<Decimal128>>;
+    case OLAP_FIELD_TYPE_CHAR:
+    case OLAP_FIELD_TYPE_VARCHAR:
+    case OLAP_FIELD_TYPE_STRING:
+        return &compare_previous<ColumnString>;
+    case OLAP_FIELD_TYPE_DATE:

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zenoyang commented on a diff in pull request #10084: [feature-wip](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
zenoyang commented on code in PR #10084:
URL: https://github.com/apache/doris/pull/10084#discussion_r901839029


##########
be/src/vec/olap/block_reader.cpp:
##########
@@ -133,6 +133,29 @@ Status BlockReader::init(const ReaderParams& read_params) {
         return status;
     }
 
+    // only group by key num <= 3 and has bitmap_union col, enable block aggregate
+    if (config::enable_block_aggregate_in_scanner && _aggregation &&
+        (_tablet->keys_type() == AGG_KEYS)) {
+        int agg_key_size = 0;
+        for (int i = 0; i < return_column_size; ++i) {
+            auto cid = read_params.origin_return_columns->at(i);
+            auto& col = _tablet->tablet_schema().column(cid);
+            if (col.is_key() && ++agg_key_size > 3) {
+                _block_aggregation = false;
+                break;
+            }
+            if (!col.is_key() &&
+                col.aggregation() == FieldAggregationMethod::OLAP_FIELD_AGGREGATION_BITMAP_UNION) {

Review Comment:
   May be. Because the aggregated single-threaded throughput such as `sum` and `count` is already very large, whether it is necessary to use multi-threading requires more testing.
   I'll provide more tests soon to decide if it's more general.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zenoyang commented on pull request #10084: [feature-wip](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
zenoyang commented on PR #10084:
URL: https://github.com/apache/doris/pull/10084#issuecomment-1165502288

   > Hi, I got a compile fail on clang
   > 
   > ```c++
   > segment_iterator.cpp:893:43: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always false [-Werror,-Wtautological-constant-out-of-range-compare]
   >             } else if (0xffffffffffffffff == mask) {
   > ```
   > 
   > maybe we can use `UINT_MAX` here.
   
   Fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Gabriel39 commented on a diff in pull request #10084: [feature-wip](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on code in PR #10084:
URL: https://github.com/apache/incubator-doris/pull/10084#discussion_r899757613


##########
be/src/vec/olap/block_reader.cpp:
##########
@@ -133,6 +133,29 @@ Status BlockReader::init(const ReaderParams& read_params) {
         return status;
     }
 
+    // only group by key num <= 3 and has bitmap_union col, enable block aggregate
+    if (config::enable_block_aggregate_in_scanner && _aggregation &&
+        (_tablet->keys_type() == AGG_KEYS)) {
+        int agg_key_size = 0;
+        for (int i = 0; i < return_column_size; ++i) {
+            auto cid = read_params.origin_return_columns->at(i);
+            auto& col = _tablet->tablet_schema().column(cid);
+            if (col.is_key() && ++agg_key_size > 3) {

Review Comment:
   if `col1` and `col2` are both table keys and agg keys, `_block_aggregation` should also be false because we will get a block  which is already aggregated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] Gabriel39 commented on a diff in pull request #10084: [feature-wip](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on code in PR #10084:
URL: https://github.com/apache/incubator-doris/pull/10084#discussion_r899760811


##########
be/src/vec/olap/block_reader.cpp:
##########
@@ -133,6 +133,29 @@ Status BlockReader::init(const ReaderParams& read_params) {
         return status;
     }
 
+    // only group by key num <= 3 and has bitmap_union col, enable block aggregate
+    if (config::enable_block_aggregate_in_scanner && _aggregation &&
+        (_tablet->keys_type() == AGG_KEYS)) {
+        int agg_key_size = 0;
+        for (int i = 0; i < return_column_size; ++i) {
+            auto cid = read_params.origin_return_columns->at(i);
+            auto& col = _tablet->tablet_schema().column(cid);
+            if (col.is_key() && ++agg_key_size > 3) {
+                _block_aggregation = false;
+                break;
+            }
+            if (!col.is_key() &&
+                col.aggregation() == FieldAggregationMethod::OLAP_FIELD_AGGREGATION_BITMAP_UNION) {

Review Comment:
   Maybe all agg will benefit from this? Could we make this optimization  more generic?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10084: [feature](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10084:
URL: https://github.com/apache/doris/pull/10084#issuecomment-1248870405

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] BiteTheDDDDt commented on pull request #10084: [feature-wip](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
BiteTheDDDDt commented on PR #10084:
URL: https://github.com/apache/doris/pull/10084#issuecomment-1161434344

   Hi, I got a compile fail on clang
   ```cpp
   segment_iterator.cpp:893:43: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always false [-Werror,-Wtautological-constant-out-of-range-compare]
               } else if (0xffffffffffffffff == mask) {
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on pull request #10084: [feature](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
yiguolei commented on PR #10084:
URL: https://github.com/apache/doris/pull/10084#issuecomment-1291394891

   I think we could merge it to 1.1-lts first. 
   For master, there are lots of issues related with parallelism. For example, there is only one node for bitmap_union function. we could improve performance by using multiple thread. 
   I think we should consider about Pipeline mechanism. 
   
   We could talk about pipeline next week  @liutang123 @wangbo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 commented on a diff in pull request #10084: [feature](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on code in PR #10084:
URL: https://github.com/apache/doris/pull/10084#discussion_r969110789


##########
be/src/util/simd/bits.h:
##########
@@ -89,16 +92,31 @@ inline size_t count_zero_num(const int8_t* __restrict data, size_t size) {
     return num;
 }
 
-inline size_t count_zero_num(const int8_t* __restrict data, const uint8_t* __restrict null_map,

Review Comment:
   why remove `__restrict` ?



##########
be/src/vec/columns/column_decimal.cpp:
##########
@@ -240,13 +240,13 @@ ColumnPtr ColumnDecimal<T>::filter(const IColumn::Filter& filt, ssize_t result_s
         *  completely pass or do not pass the filter.
         * Therefore, we will optimistically check the parts of `SIMD_BYTES` values.
         */
-    static constexpr size_t SIMD_BYTES = 32;
+    static constexpr size_t SIMD_BYTES = 64;
     const UInt8* filt_end_sse = filt_pos + size / SIMD_BYTES * SIMD_BYTES;
 
     while (filt_pos < filt_end_sse) {
-        uint32_t mask = simd::bytes32_mask_to_bits32_mask(filt_pos);
+        auto mask = simd::bytes64_mask_to_bits64_mask(filt_pos);
 
-        if (0xFFFFFFFF == mask) {
+        if (0xFFFFFFFFFFFFFFFF == mask) {

Review Comment:
   Could you make this mask to a common constexpr? I see it is used in multiple places.



##########
be/src/util/simd/bits.h:
##########
@@ -89,16 +92,31 @@ inline size_t count_zero_num(const int8_t* __restrict data, size_t size) {
     return num;
 }
 
-inline size_t count_zero_num(const int8_t* __restrict data, const uint8_t* __restrict null_map,

Review Comment:
   This function seems never used. Could we remove it?



##########
be/src/vec/olap/block_reader.cpp:
##########
@@ -168,6 +177,49 @@ Status BlockReader::init(const ReaderParams& read_params) {
     return Status::OK();
 }
 
+Status BlockReader::next_block_with_aggregation(Block* block, MemPool* mem_pool,
+                                                ObjectPool* agg_pool, bool* eof) {
+    return _block_aggregator ? _agg_next_block(block, mem_pool, agg_pool, eof)

Review Comment:
   If agg is push down, we'd better to let `_next_block_func` be `BlockReader::_agg_next_block` instead of make this condition judgment each time



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zenoyang commented on a diff in pull request #10084: [feature](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
zenoyang commented on code in PR #10084:
URL: https://github.com/apache/doris/pull/10084#discussion_r969245544


##########
be/src/vec/olap/block_reader.cpp:
##########
@@ -168,6 +177,49 @@ Status BlockReader::init(const ReaderParams& read_params) {
     return Status::OK();
 }
 
+Status BlockReader::next_block_with_aggregation(Block* block, MemPool* mem_pool,
+                                                ObjectPool* agg_pool, bool* eof) {
+    return _block_aggregator ? _agg_next_block(block, mem_pool, agg_pool, eof)

Review Comment:
   Great suggestion, I've changed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] closed pull request #10084: [feature](vectorized) Support block aggregate in scanner

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #10084: [feature](vectorized) Support block aggregate in scanner
URL: https://github.com/apache/doris/pull/10084


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zenoyang commented on a diff in pull request #10084: [feature-wip](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
zenoyang commented on code in PR #10084:
URL: https://github.com/apache/doris/pull/10084#discussion_r901835915


##########
be/src/vec/olap/block_reader.cpp:
##########
@@ -133,6 +133,29 @@ Status BlockReader::init(const ReaderParams& read_params) {
         return status;
     }
 
+    // only group by key num <= 3 and has bitmap_union col, enable block aggregate
+    if (config::enable_block_aggregate_in_scanner && _aggregation &&
+        (_tablet->keys_type() == AGG_KEYS)) {
+        int agg_key_size = 0;
+        for (int i = 0; i < return_column_size; ++i) {
+            auto cid = read_params.origin_return_columns->at(i);
+            auto& col = _tablet->tablet_schema().column(cid);
+            if (col.is_key() && ++agg_key_size > 3) {

Review Comment:
   no. If `col1` and `col2` are both table keys and agg keys, the block we get is aggregated based on all keys in the aggregation table, not based on the group by key column. So `_block_aggregation` still needs to be true



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10084: [feature](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10084:
URL: https://github.com/apache/doris/pull/10084#issuecomment-1248870421

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 commented on a diff in pull request #10084: [feature](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on code in PR #10084:
URL: https://github.com/apache/doris/pull/10084#discussion_r969515435


##########
be/src/vec/olap/block_aggregator.cpp:
##########
@@ -0,0 +1,234 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "vec/olap/block_aggregator.h"
+
+#include "util/simd/bits.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/columns_number.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type_factory.hpp"
+
+namespace doris::vectorized {
+
+template <typename ColumnType>
+inline void compare_previous(const IColumn& col, uint8_t* flags) {
+    if (col.is_nullable()) {
+        auto col_ptr = reinterpret_cast<const ColumnNullable*>(&col);
+        for (int i = 1; i < col_ptr->size(); ++i) {
+            flags[i] &= (col_ptr->compare_at(i, i - 1, col, -1) == 0);
+        }
+    } else {
+        auto col_ptr = reinterpret_cast<const ColumnType*>(&col);
+        for (int i = 1; i < col_ptr->size(); ++i) {
+            flags[i] &= (col_ptr->compare_at(i, i - 1, col, -1) == 0);
+        }
+    }
+}
+
+BlockAggregator::BlockAggregator(const TabletSchema* tablet_schema,
+                                 const std::vector<uint32_t>& output_columns,
+                                 const std::vector<uint32_t>& return_columns,
+                                 const std::unordered_set<uint32_t>* null_set, const int batch_size)
+        : _tablet_schema(tablet_schema),
+          _output_columns(output_columns),
+          _sorted_output_columns(output_columns),
+          _return_columns(return_columns),
+          _null_set(null_set),
+          _batch_size(batch_size) {
+    _has_agg_data = false;
+    _do_aggregate = true;
+    _agg_ratio = config::block_aggregate_ratio;
+    _num_columns = _output_columns.size();
+    _num_key_columns = 0;
+
+    // todo(zeno) Make sure column prune done
+    std::sort(_sorted_output_columns.begin(), _sorted_output_columns.end());
+    _output_columns_loc.resize(_num_columns);
+    for (int i = 0; i < _num_columns; ++i) {
+        for (int j = 0; j < _num_columns; ++j) {
+            if (_sorted_output_columns[i] == output_columns[j]) {
+                _output_columns_loc[i] = j;
+                break;
+            }
+        }
+    }
+
+    for (auto index : _sorted_output_columns) {
+        if (_tablet_schema->column(index).is_key()) {
+            _num_key_columns++;
+        }
+    }
+
+    _current_row = 0;
+    _source_size = 0;
+    _eq_previous.reserve(_batch_size);
+    _agg_index.reserve(_batch_size);
+    _agg_range.reserve(_batch_size);
+
+    // init _key_comparator
+    for (int i = 0; i < _num_key_columns; ++i) {
+        _key_comparator.emplace_back(
+                _get_comparator(_tablet_schema->column(_sorted_output_columns[i])));
+    }
+
+    // init _column_aggregator
+    for (int i = 0; i < _num_key_columns; ++i) {
+        _column_aggregator.emplace_back(std::make_unique<KeyColumnAggregator>());
+    }
+
+    for (int i = _num_key_columns; i < _num_columns; ++i) {
+        bool is_nullable = (_null_set != nullptr &&
+                            _null_set->find(_sorted_output_columns[i]) != _null_set->end());
+        auto col = _tablet_schema->column(_sorted_output_columns[i]);
+        auto data_type = vectorized::DataTypeFactory::instance().create_data_type(col, is_nullable);
+        _column_aggregator.emplace_back(std::make_unique<ValueColumnAggregator>(col, data_type));
+    }
+
+    aggregate_reset();
+}
+
+CompareFunc BlockAggregator::_get_comparator(const TabletColumn& col) {
+    switch (col.type()) {
+    case OLAP_FIELD_TYPE_TINYINT:
+        return &compare_previous<ColumnInt8>;
+    case OLAP_FIELD_TYPE_SMALLINT:
+        return &compare_previous<ColumnInt16>;
+    case OLAP_FIELD_TYPE_INT:
+        return &compare_previous<ColumnInt32>;
+    case OLAP_FIELD_TYPE_BIGINT:
+        return &compare_previous<ColumnInt64>;
+    case OLAP_FIELD_TYPE_LARGEINT:
+        return &compare_previous<ColumnInt128>;
+    case OLAP_FIELD_TYPE_BOOL:
+        return &compare_previous<ColumnUInt8>;
+    case OLAP_FIELD_TYPE_FLOAT:
+        return &compare_previous<ColumnFloat32>;
+    case OLAP_FIELD_TYPE_DOUBLE:
+        return &compare_previous<ColumnFloat64>;
+    case OLAP_FIELD_TYPE_DECIMAL:
+        return &compare_previous<ColumnDecimal<Decimal128>>;
+    case OLAP_FIELD_TYPE_CHAR:
+    case OLAP_FIELD_TYPE_VARCHAR:
+    case OLAP_FIELD_TYPE_STRING:
+        return &compare_previous<ColumnString>;
+    case OLAP_FIELD_TYPE_DATE:

Review Comment:
   OLAP_FIELD_TYPE_DATEV2,OLAP_FIELD_TYPE_DATETIMEV2,OLAP_FIELD_TYPE_DECIMAL32,OLAP_FIELD_TYPE_DECIMAL64,OLAP_FIELD_TYPE_DECIMAL128 also need to be processed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zenoyang commented on a diff in pull request #10084: [feature](vectorized) Support block aggregate in scanner

Posted by GitBox <gi...@apache.org>.
zenoyang commented on code in PR #10084:
URL: https://github.com/apache/doris/pull/10084#discussion_r969180648


##########
be/src/util/simd/bits.h:
##########
@@ -89,16 +92,31 @@ inline size_t count_zero_num(const int8_t* __restrict data, size_t size) {
     return num;
 }
 
-inline size_t count_zero_num(const int8_t* __restrict data, const uint8_t* __restrict null_map,

Review Comment:
   Because `__restrict` is redundant for `const` pointers. I see this function is called in `VRuntimeFilterWrapper::execute` and cannot be remove.



##########
be/src/util/simd/bits.h:
##########
@@ -89,16 +92,31 @@ inline size_t count_zero_num(const int8_t* __restrict data, size_t size) {
     return num;
 }
 
-inline size_t count_zero_num(const int8_t* __restrict data, const uint8_t* __restrict null_map,

Review Comment:
   Because `__restrict` is redundant for `const` pointers. I see this function is called in `VRuntimeFilterWrapper::execute` and cannot be remove.



##########
be/src/vec/columns/column_decimal.cpp:
##########
@@ -240,13 +240,13 @@ ColumnPtr ColumnDecimal<T>::filter(const IColumn::Filter& filt, ssize_t result_s
         *  completely pass or do not pass the filter.
         * Therefore, we will optimistically check the parts of `SIMD_BYTES` values.
         */
-    static constexpr size_t SIMD_BYTES = 32;
+    static constexpr size_t SIMD_BYTES = 64;
     const UInt8* filt_end_sse = filt_pos + size / SIMD_BYTES * SIMD_BYTES;
 
     while (filt_pos < filt_end_sse) {
-        uint32_t mask = simd::bytes32_mask_to_bits32_mask(filt_pos);
+        auto mask = simd::bytes64_mask_to_bits64_mask(filt_pos);
 
-        if (0xFFFFFFFF == mask) {
+        if (0xFFFFFFFFFFFFFFFF == mask) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org