You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2023/01/13 13:19:33 UTC

[GitHub] [doris] xinyiZzz opened a new pull request, #15917: [improvement](scan) Support pushdown execute expr ctx

xinyiZzz opened a new pull request, #15917:
URL: https://github.com/apache/doris/pull/15917

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1122663273


##########
be/src/vec/exec/scan/vscan_node.cpp:
##########
@@ -458,7 +461,15 @@ Status VScanNode::_normalize_conjuncts() {
             RETURN_IF_ERROR(_normalize_predicate((*_vconjunct_ctx_ptr)->root(), &new_root));
             if (new_root) {
                 (*_vconjunct_ctx_ptr)->set_root(new_root);
-            } else {
+                if (_should_push_down_common_expr()) {
+                    // Copy _vconjunct_ctx_ptr to _common_vexpr_ctxs_pushdown and push down, mark _vconjunct_ctx_ptr as stale.

Review Comment:
   这里不需要这么写; 我感觉只要把两个指针交换一下,然后把_vconjunct_ctx_ptr 设置为nullptr 就行了



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115142997


##########
be/src/runtime/runtime_state.h:
##########
@@ -118,6 +118,11 @@ class RuntimeState {
                _query_options.check_overflow_for_decimal;
     }
 
+    bool enable_remaining_expr_pushdown() const {

Review Comment:
   use another name, for example: enable_common_expr_pushdown



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1116607158


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1160,58 +1189,86 @@ void SegmentIterator::_vec_init_lazy_materialization() {
         _is_need_short_eval = true;
     }
 
-    // Step 2: check non-predicate read costs to determine whether need lazy materialization
-    // fill _non_predicate_columns.
-    // After some optimization, we suppose lazy materialization is better performance.
+    // make _schema_block_id_map
+    _schema_block_id_map.resize(_schema.columns().size());
+    for (int i = 0; i < _schema.num_column_ids(); i++) {
+        auto cid = _schema.column_id(i);
+        _schema_block_id_map[cid] = i;
+    }
+
+    // Step2: extract columns that can execute expr context
+    _is_remaining_expr_column.resize(_schema.columns().size(), false);
+    if (_enable_remaining_expr_pushdown && _remaining_vconjunct_root != nullptr) {
+        _extract_remaining_expr_columns(_remaining_vconjunct_root);
+        if (!_remaining_expr_columns.empty()) {
+            _is_need_expr_eval = true;
+            for (auto cid : _schema.column_ids()) {
+                // pred column also needs to be filtered by expr
+                if (_is_remaining_expr_column[cid] || _is_pred_column[cid]) {
+                    auto loc = _schema_block_id_map[cid];
+                    _columns_to_filter.push_back(loc);
+                }
+            }
+        }
+    }
+
+    // Step 3: fill non predicate columns and second read columns
     if (_schema.column_ids().size() > pred_column_ids.size()) {
         for (auto cid : _schema.column_ids()) {
             if (!_is_pred_column[cid]) {
-                _non_predicate_columns.push_back(cid);
+                if (!_is_remaining_expr_column[cid]) {
+                    _non_predicate_columns.push_back(cid);
+                } else {
+                    _second_read_column_ids.push_back(cid);
+                }
                 if (_is_need_vec_eval || _is_need_short_eval) {
                     _lazy_materialization_read = true;
                 }
             }
         }
     }
 

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1102267002


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1449,9 +1573,136 @@ Status SegmentIterator::next_batch(vectorized::Block* block) {
                     block->get_by_position(i).column->permute(permutation, num_rows);
     }
 
+    // LOG(INFO) << "SegmentIterator::next_batch over block: " << block->each_col_size();
     return Status::OK();
 }
 
+static int execute_remaining_exp_num = 0;
+
+uint16_t SegmentIterator::_execute_remaining_expr(uint16_t* sel_rowid_idx, uint16_t selected_size,
+                                                  vectorized::Block* block) {
+    SCOPED_RAW_TIMER(&_opts.stats->expr_filter_ns);
+    DCHECK(_remaining_vconjunct_root != nullptr);
+    DCHECK(block->rows() != 0);
+    // LOG(INFO) << "_execute_remaining_expr 000 block: " << block->each_col_size();
+    // LOG(INFO) << "_execute_remaining_expr 111 block: " << _remaining_vconjunct_root->debug_string();
+    size_t prev_columns = block->columns();
+    int result_column_id = -1;
+    execute_remaining_exp_num++;
+    // LOG(INFO) << "_execute_remaining_expr 777 block: " << block->each_col_size() << ", result_column_id " << result_column_id
+    //             // << ", dump_data " << block->dump_data(0, 1)
+    //             << ", conjunct root: " << _remaining_vconjunct_root->debug_string()
+    //             << ", child size: " << _remaining_vconjunct_root->children().size()
+    //             << ", execute_remaining_exp_num " << execute_remaining_exp_num
+    //             << " block->rows() " << block->rows();
+    RETURN_IF_ERROR(_remaining_vconjunct_ctx->execute(block, &result_column_id));
+    vectorized::ColumnPtr filter_column = block->get_by_position(result_column_id).column;
+    if (auto* nullable_column =
+                vectorized::check_and_get_column<vectorized::ColumnNullable>(*filter_column)) {
+        // LOG(INFO) << "_execute_remaining_expr 111";
+        vectorized::ColumnPtr nested_column = nullable_column->get_nested_column_ptr();
+
+        vectorized::MutableColumnPtr mutable_holder =
+                nested_column->use_count() == 1
+                        ? nested_column->assume_mutable()
+                        : nested_column->clone_resized(nested_column->size());
+
+        vectorized::ColumnUInt8* concrete_column =
+                typeid_cast<vectorized::ColumnUInt8*>(mutable_holder.get());
+        if (!concrete_column) {
+            return Status::InvalidArgument(
+                    "Illegal type {} of column for filter. Must be UInt8 or Nullable(UInt8).",
+                    filter_column->get_name());
+        }
+        auto* __restrict null_map = nullable_column->get_null_map_data().data();
+        vectorized::IColumn::Filter& filter = concrete_column->get_data();
+        auto* __restrict filter_data = filter.data();
+
+        const size_t size = filter.size();
+        for (size_t i = 0; i < size; ++i) {
+            filter_data[i] &= !null_map[i];
+        }
+
+        selected_size = _evaluate_remaining_expr_filter(sel_rowid_idx, selected_size, filter);
+        vectorized::Block::filter_block_internal(block, _columns_to_filter, filter);
+    } else if (auto* const_column =
+                       vectorized::check_and_get_column<vectorized::ColumnConst>(*filter_column)) {
+        // LOG(INFO) << "_execute_remaining_expr 222";
+        bool ret = const_column->get_bool(0);

Review Comment:
   warning: std::move of the const expression has no effect; remove std::move() [performance-move-const-arg]
   
   ```suggestion
                   *block->get_by_position(col).column.assume_mutable()->clear();
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei merged pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei merged PR #15917:
URL: https://github.com/apache/doris/pull/15917


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by GitBox <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1071126734


##########
be/src/olap/schema.h:
##########
@@ -144,6 +144,7 @@ class Schema {
     const std::vector<ColumnId>& column_ids() const { return _col_ids; }
     const std::vector<int32_t>& unique_ids() const { return _unique_ids; }
     ColumnId column_id(size_t index) const { return _col_ids[index]; }
+    ColumnId column_id(const std::string& cname) const { return _field_name_to_col_id.at(cname); }

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1445395776

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1445527591

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1069483554


##########
be/src/olap/schema.h:
##########
@@ -144,6 +144,7 @@ class Schema {
     const std::vector<ColumnId>& column_ids() const { return _col_ids; }
     const std::vector<int32_t>& unique_ids() const { return _unique_ids; }
     ColumnId column_id(size_t index) const { return _col_ids[index]; }
+    ColumnId column_id(const std::string& cname) const { return _field_name_to_col_id.at(cname); }

Review Comment:
   why need column name here?  column name is not right during light weight schema change. For example, user call alter table rename column



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1116608487


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1160,58 +1189,86 @@ void SegmentIterator::_vec_init_lazy_materialization() {
         _is_need_short_eval = true;
     }
 
-    // Step 2: check non-predicate read costs to determine whether need lazy materialization
-    // fill _non_predicate_columns.
-    // After some optimization, we suppose lazy materialization is better performance.
+    // make _schema_block_id_map
+    _schema_block_id_map.resize(_schema.columns().size());
+    for (int i = 0; i < _schema.num_column_ids(); i++) {
+        auto cid = _schema.column_id(i);
+        _schema_block_id_map[cid] = i;
+    }
+
+    // Step2: extract columns that can execute expr context
+    _is_remaining_expr_column.resize(_schema.columns().size(), false);
+    if (_enable_remaining_expr_pushdown && _remaining_vconjunct_root != nullptr) {
+        _extract_remaining_expr_columns(_remaining_vconjunct_root);
+        if (!_remaining_expr_columns.empty()) {
+            _is_need_expr_eval = true;
+            for (auto cid : _schema.column_ids()) {
+                // pred column also needs to be filtered by expr
+                if (_is_remaining_expr_column[cid] || _is_pred_column[cid]) {
+                    auto loc = _schema_block_id_map[cid];
+                    _columns_to_filter.push_back(loc);
+                }
+            }
+        }
+    }
+
+    // Step 3: fill non predicate columns and second read columns
     if (_schema.column_ids().size() > pred_column_ids.size()) {
         for (auto cid : _schema.column_ids()) {
             if (!_is_pred_column[cid]) {
-                _non_predicate_columns.push_back(cid);
+                if (!_is_remaining_expr_column[cid]) {
+                    _non_predicate_columns.push_back(cid);
+                } else {
+                    _second_read_column_ids.push_back(cid);
+                }
                 if (_is_need_vec_eval || _is_need_short_eval) {
                     _lazy_materialization_read = true;
                 }
             }
         }
     }
 
-    // Step 3: fill column ids for read and output
+    // Step 4: fill first read columns
     if (_lazy_materialization_read) {
         // insert pred cid to first_read_columns
         for (auto cid : pred_column_ids) {
             _first_read_column_ids.push_back(cid);
         }
-    } else if (!_is_need_vec_eval &&
-               !_is_need_short_eval) { // no pred exists, just read and output column
+    } else if (!_is_need_vec_eval && !_is_need_short_eval &&
+               !_is_need_expr_eval) { // no pred exists, just read and output column
         for (int i = 0; i < _schema.num_column_ids(); i++) {
             auto cid = _schema.column_id(i);
             _first_read_column_ids.push_back(cid);
         }
-    } else { // pred exits, but we can eliminate lazy materialization
-        // insert pred/non-pred cid to first read columns
-        std::set<ColumnId> pred_id_set;
-        pred_id_set.insert(_short_cir_pred_column_ids.begin(), _short_cir_pred_column_ids.end());
-        pred_id_set.insert(_vec_pred_column_ids.begin(), _vec_pred_column_ids.end());
-        std::set<ColumnId> non_pred_set(_non_predicate_columns.begin(),
-                                        _non_predicate_columns.end());
-
-        for (int i = 0; i < _schema.num_column_ids(); i++) {
-            auto cid = _schema.column_id(i);
-            if (pred_id_set.find(cid) != pred_id_set.end()) {
-                _first_read_column_ids.push_back(cid);
-            } else if (non_pred_set.find(cid) != non_pred_set.end()) {
+    } else {
+        if (_is_need_vec_eval || _is_need_short_eval) {
+            // TODO To refactor, because we suppose lazy materialization is better performance.
+            // pred exits, but we can eliminate lazy materialization
+            // insert pred/non-pred cid to first read columns
+            std::set<ColumnId> pred_id_set;
+            pred_id_set.insert(_short_cir_pred_column_ids.begin(),
+                               _short_cir_pred_column_ids.end());
+            pred_id_set.insert(_vec_pred_column_ids.begin(), _vec_pred_column_ids.end());
+            std::set<ColumnId> non_pred_set(_non_predicate_columns.begin(),
+                                            _non_predicate_columns.end());
+
+            for (int i = 0; i < _schema.num_column_ids(); i++) {
+                auto cid = _schema.column_id(i);
+                if (pred_id_set.find(cid) != pred_id_set.end()) {
+                    _first_read_column_ids.push_back(cid);
+                } else if (non_pred_set.find(cid) != non_pred_set.end()) {
+                    _first_read_column_ids.push_back(cid);
+                    // when _lazy_materialization_read = false, non-predicate column should also be filtered by sel idx, so we regard it as pred columns
+                    _is_pred_column[cid] = true;
+                }
+            }
+        } else if (_is_need_expr_eval) {
+            DCHECK(!_is_need_vec_eval && !_is_need_short_eval);
+            for (auto cid : _remaining_expr_columns) {
                 _first_read_column_ids.push_back(cid);
-                // when _lazy_materialization_read = false, non-predicate column should also be filtered by sel idx, so we regard it as pred columns
-                _is_pred_column[cid] = true;
             }

Review Comment:
   In this case `_second_read_column_ids` is empty and there is no common expr pushdown.
   Added a comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115206640


##########
be/src/runtime/runtime_state.h:
##########
@@ -118,6 +118,11 @@ class RuntimeState {
                _query_options.check_overflow_for_decimal;
     }
 
+    bool enable_remaining_expr_pushdown() const {

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1437822391

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1126888802


##########
be/src/vec/core/block.cpp:
##########
@@ -321,13 +322,16 @@ void Block::check_number_of_rows(bool allow_null_columns) const {
         if (rows == -1) {
             rows = size;
         } else if (rows != size) {
-            LOG(FATAL) << fmt::format("Sizes of columns doesn't match: {}:{},{}:{}",
-                                      data.front().name, rows, elem.name, size);
+            LOG(FATAL) << fmt::format("Sizes of columns doesn't match: {}:{},{}:{}, col size: {}",
+                                      data.front().name, rows, elem.name, size, each_col_size());
         }
     }
 }
 
 size_t Block::rows() const {
+    if (_effective_col != INT_MIN) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] k-i-d-d commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "k-i-d-d (via GitHub)" <gi...@apache.org>.
k-i-d-d commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1453448591

   run beut macos


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1460737156

   ./run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1129940414


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1583,31 +1651,117 @@ Status SegmentIterator::next_batch(vectorized::Block* block) {
         for (int i = 0; i < block->columns(); i++) {
             auto cid = _schema.column_id(i);
             // todo(wb) abstract make column where
-            if (!_is_pred_column[cid]) { // non-predicate
+            if (!_is_pred_column[cid]) {
                 block->replace_by_position(i, std::move(_current_return_columns[cid]));
             }
         }
         block->clear_column_data();
         return Status::EndOfFile("no more data in segment");
     }
 
-    if (!_is_need_vec_eval && !_is_need_short_eval) {
-        _replace_version_col(_current_batch_rows_read);
+    if (!_is_need_vec_eval && !_is_need_short_eval && !_is_need_expr_eval) {
         _output_non_pred_columns(block);
         _output_index_result_column(nullptr, 0, block);
     } else {
-        _convert_dict_code_for_predicate_if_necessary();
         uint16_t selected_size = _current_batch_rows_read;
         uint16_t sel_rowid_idx[selected_size];
 
-        // step 1: evaluate vectorization predicate
-        selected_size = _evaluate_vectorization_predicate(sel_rowid_idx, selected_size);
+        if (_is_need_vec_eval || _is_need_short_eval) {
+            _convert_dict_code_for_predicate_if_necessary();
+
+            // step 1: evaluate vectorization predicate
+            selected_size = _evaluate_vectorization_predicate(sel_rowid_idx, selected_size);
+
+            // step 2: evaluate short circuit predicate
+            // todo(wb) research whether need to read short predicate after vectorization evaluation
+            //          to reduce cost of read short circuit columns.
+            //          In SSB test, it make no difference; So need more scenarios to test
+            selected_size = _evaluate_short_circuit_predicate(sel_rowid_idx, selected_size);
 
-        // step 2: evaluate short circuit predicate
-        // todo(wb) research whether need to read short predicate after vectorization evaluation
-        //          to reduce cost of read short circuit columns.
-        //          In SSB test, it make no difference; So need more scenarios to test
-        selected_size = _evaluate_short_circuit_predicate(sel_rowid_idx, selected_size);
+            if (selected_size > 0) {
+                // step 3.1: output short circuit and predicate column
+                // when lazy materialization enables, _first_read_column_ids = distinct(_short_cir_pred_column_ids + _vec_pred_column_ids)
+                // see _vec_init_lazy_materialization
+                // todo(wb) need to tell input columnids from output columnids
+                RETURN_IF_ERROR(_output_column_by_sel_idx(block, _first_read_column_ids,
+                                                          sel_rowid_idx, selected_size));
+
+                // step 3.2: read remaining expr column and evaluate it.
+                if (_is_need_expr_eval) {
+                    // The predicate column contains the remaining expr column, no need second read.
+                    if (!_second_read_column_ids.empty()) {
+                        SCOPED_RAW_TIMER(&_opts.stats->second_read_ns);
+                        RETURN_IF_ERROR(_read_columns_by_rowids(
+                                _second_read_column_ids, _block_rowids, sel_rowid_idx,
+                                selected_size, &_current_return_columns));
+                        if (std::find(_second_read_column_ids.begin(),
+                                      _second_read_column_ids.end(),
+                                      _schema.version_col_idx()) != _second_read_column_ids.end()) {
+                            _replace_version_col(selected_size);
+                        }
+                        for (auto cid : _second_read_column_ids) {
+                            auto loc = _schema_block_id_map[cid];
+                            block->replace_by_position(loc,
+                                                       std::move(_current_return_columns[cid]));
+                        }
+                    }
+
+                    DCHECK(block->columns() > _schema_block_id_map[*_common_expr_columns.begin()]);
+                    // block->rows() takes the size of the first column by default. If the first column is no predicate column,
+                    // it has not been read yet. add a const column that has been read to calculate rows().
+                    if (block->rows() == 0) {
+                        vectorized::MutableColumnPtr col0 =
+                                std::move(*block->get_by_position(0).column).mutate();

Review Comment:
   warning: std::move of the const expression has no effect; remove std::move() [performance-move-const-arg]
   
   ```suggestion
                                   *block->get_by_position(0).column.mutate();
   ```
   



##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1583,31 +1651,117 @@
         for (int i = 0; i < block->columns(); i++) {
             auto cid = _schema.column_id(i);
             // todo(wb) abstract make column where
-            if (!_is_pred_column[cid]) { // non-predicate
+            if (!_is_pred_column[cid]) {
                 block->replace_by_position(i, std::move(_current_return_columns[cid]));
             }
         }
         block->clear_column_data();
         return Status::EndOfFile("no more data in segment");
     }
 
-    if (!_is_need_vec_eval && !_is_need_short_eval) {
-        _replace_version_col(_current_batch_rows_read);
+    if (!_is_need_vec_eval && !_is_need_short_eval && !_is_need_expr_eval) {
         _output_non_pred_columns(block);
         _output_index_result_column(nullptr, 0, block);
     } else {
-        _convert_dict_code_for_predicate_if_necessary();
         uint16_t selected_size = _current_batch_rows_read;
         uint16_t sel_rowid_idx[selected_size];
 
-        // step 1: evaluate vectorization predicate
-        selected_size = _evaluate_vectorization_predicate(sel_rowid_idx, selected_size);
+        if (_is_need_vec_eval || _is_need_short_eval) {
+            _convert_dict_code_for_predicate_if_necessary();
+
+            // step 1: evaluate vectorization predicate
+            selected_size = _evaluate_vectorization_predicate(sel_rowid_idx, selected_size);
+
+            // step 2: evaluate short circuit predicate
+            // todo(wb) research whether need to read short predicate after vectorization evaluation
+            //          to reduce cost of read short circuit columns.
+            //          In SSB test, it make no difference; So need more scenarios to test
+            selected_size = _evaluate_short_circuit_predicate(sel_rowid_idx, selected_size);
 
-        // step 2: evaluate short circuit predicate
-        // todo(wb) research whether need to read short predicate after vectorization evaluation
-        //          to reduce cost of read short circuit columns.
-        //          In SSB test, it make no difference; So need more scenarios to test
-        selected_size = _evaluate_short_circuit_predicate(sel_rowid_idx, selected_size);
+            if (selected_size > 0) {
+                // step 3.1: output short circuit and predicate column
+                // when lazy materialization enables, _first_read_column_ids = distinct(_short_cir_pred_column_ids + _vec_pred_column_ids)
+                // see _vec_init_lazy_materialization
+                // todo(wb) need to tell input columnids from output columnids
+                RETURN_IF_ERROR(_output_column_by_sel_idx(block, _first_read_column_ids,
+                                                          sel_rowid_idx, selected_size));
+
+                // step 3.2: read remaining expr column and evaluate it.
+                if (_is_need_expr_eval) {
+                    // The predicate column contains the remaining expr column, no need second read.
+                    if (!_second_read_column_ids.empty()) {
+                        SCOPED_RAW_TIMER(&_opts.stats->second_read_ns);
+                        RETURN_IF_ERROR(_read_columns_by_rowids(
+                                _second_read_column_ids, _block_rowids, sel_rowid_idx,
+                                selected_size, &_current_return_columns));
+                        if (std::find(_second_read_column_ids.begin(),
+                                      _second_read_column_ids.end(),
+                                      _schema.version_col_idx()) != _second_read_column_ids.end()) {
+                            _replace_version_col(selected_size);
+                        }
+                        for (auto cid : _second_read_column_ids) {
+                            auto loc = _schema_block_id_map[cid];
+                            block->replace_by_position(loc,
+                                                       std::move(_current_return_columns[cid]));
+                        }
+                    }
+
+                    DCHECK(block->columns() > _schema_block_id_map[*_common_expr_columns.begin()]);
+                    // block->rows() takes the size of the first column by default. If the first column is no predicate column,
+                    // it has not been read yet. add a const column that has been read to calculate rows().
+                    if (block->rows() == 0) {
+                        vectorized::MutableColumnPtr col0 =
+                                std::move(*block->get_by_position(0).column).mutate();
+                        auto res_column = vectorized::ColumnString::create();
+                        res_column->insert_data("", 0);
+                        auto col_const = vectorized::ColumnConst::create(std::move(res_column),
+                                                                         selected_size);
+                        block->replace_by_position(0, std::move(col_const));
+                        _output_index_result_column(sel_rowid_idx, selected_size, block);
+                        block->shrink_char_type_column_suffix_zero(_char_type_idx_no_0);
+                        RETURN_IF_ERROR(_execute_common_expr(sel_rowid_idx, selected_size, block));
+                        block->replace_by_position(0, std::move(col0));
+                    } else {
+                        _output_index_result_column(sel_rowid_idx, selected_size, block);
+                        block->shrink_char_type_column_suffix_zero(_char_type_idx);
+                        RETURN_IF_ERROR(_execute_common_expr(sel_rowid_idx, selected_size, block));
+                    }
+                }
+            } else if (_is_need_expr_eval) {
+                for (auto cid : _second_read_column_ids) {
+                    auto loc = _schema_block_id_map[cid];
+                    block->replace_by_position(loc, std::move(_current_return_columns[cid]));
+                }
+            }
+        } else if (_is_need_expr_eval) {
+            DCHECK(!_first_read_column_ids.empty());
+            // first read all rows are insert block, initialize sel_rowid_idx to all rows.
+            for (auto cid : _first_read_column_ids) {
+                auto loc = _schema_block_id_map[cid];
+                block->replace_by_position(loc, std::move(_current_return_columns[cid]));
+            }
+            for (uint32_t i = 0; i < selected_size; ++i) {
+                sel_rowid_idx[i] = i;
+            }
+
+            if (block->rows() == 0) {
+                vectorized::MutableColumnPtr col0 =
+                        std::move(*block->get_by_position(0).column).mutate();

Review Comment:
   warning: std::move of the const expression has no effect; remove std::move() [performance-move-const-arg]
   
   ```suggestion
                           *block->get_by_position(0).column.mutate();
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1461187543

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1116627039


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1505,14 +1562,16 @@ Status SegmentIterator::next_batch(vectorized::Block* block) {
 
     _current_batch_rows_read = 0;
     uint32_t nrows_read_limit = _opts.block_row_max;
-    if (UNLIKELY(_estimate_row_size)) {
-        // read 100 rows to estimate average row size
+    if (UNLIKELY(_opts.stats->blocks_load == 0)) {

Review Comment:
   fixed
   // Read up to 100 rows at a time while waiting for the estimated row size.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115188575


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -31,9 +31,18 @@
 #include "util/doris_metrics.h"
 #include "util/key_util.h"
 #include "util/simd/bits.h"
+#include "vec/columns/column.h"
+// #include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+// #include "vec/columns/column_nullable.h"

Review Comment:
   remove these comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1438453116

   run clickbench


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115208124


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -181,7 +190,10 @@ Status SegmentIterator::init(const StorageReadOptions& opts) {
         _output_columns = *(opts.output_columns);
     }
 
-    _remaining_vconjunct_root = opts.remaining_vconjunct_root;
+    _remaining_vconjunct_ctx = opts.remaining_vconjunct_ctx;

Review Comment:
   `_remaining_vconjunct_root` = `_remaining_vconjunct_ctx->root()`
   Got `_remaining_vconjunct_root` in `SegmentIterator::init`. Compatible before



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1437713064

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1436520976

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115298429


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1160,58 +1189,86 @@ void SegmentIterator::_vec_init_lazy_materialization() {
         _is_need_short_eval = true;
     }
 
-    // Step 2: check non-predicate read costs to determine whether need lazy materialization
-    // fill _non_predicate_columns.
-    // After some optimization, we suppose lazy materialization is better performance.
+    // make _schema_block_id_map
+    _schema_block_id_map.resize(_schema.columns().size());
+    for (int i = 0; i < _schema.num_column_ids(); i++) {
+        auto cid = _schema.column_id(i);
+        _schema_block_id_map[cid] = i;
+    }
+
+    // Step2: extract columns that can execute expr context
+    _is_remaining_expr_column.resize(_schema.columns().size(), false);
+    if (_enable_remaining_expr_pushdown && _remaining_vconjunct_root != nullptr) {
+        _extract_remaining_expr_columns(_remaining_vconjunct_root);
+        if (!_remaining_expr_columns.empty()) {
+            _is_need_expr_eval = true;
+            for (auto cid : _schema.column_ids()) {
+                // pred column also needs to be filtered by expr
+                if (_is_remaining_expr_column[cid] || _is_pred_column[cid]) {
+                    auto loc = _schema_block_id_map[cid];
+                    _columns_to_filter.push_back(loc);
+                }
+            }
+        }
+    }
+
+    // Step 3: fill non predicate columns and second read columns
     if (_schema.column_ids().size() > pred_column_ids.size()) {
         for (auto cid : _schema.column_ids()) {
             if (!_is_pred_column[cid]) {
-                _non_predicate_columns.push_back(cid);
+                if (!_is_remaining_expr_column[cid]) {
+                    _non_predicate_columns.push_back(cid);
+                } else {
+                    _second_read_column_ids.push_back(cid);
+                }
                 if (_is_need_vec_eval || _is_need_short_eval) {
                     _lazy_materialization_read = true;
                 }
             }
         }
     }
 
-    // Step 3: fill column ids for read and output
+    // Step 4: fill first read columns
     if (_lazy_materialization_read) {
         // insert pred cid to first_read_columns
         for (auto cid : pred_column_ids) {
             _first_read_column_ids.push_back(cid);
         }
-    } else if (!_is_need_vec_eval &&
-               !_is_need_short_eval) { // no pred exists, just read and output column
+    } else if (!_is_need_vec_eval && !_is_need_short_eval &&
+               !_is_need_expr_eval) { // no pred exists, just read and output column
         for (int i = 0; i < _schema.num_column_ids(); i++) {
             auto cid = _schema.column_id(i);
             _first_read_column_ids.push_back(cid);
         }
-    } else { // pred exits, but we can eliminate lazy materialization
-        // insert pred/non-pred cid to first read columns
-        std::set<ColumnId> pred_id_set;
-        pred_id_set.insert(_short_cir_pred_column_ids.begin(), _short_cir_pred_column_ids.end());
-        pred_id_set.insert(_vec_pred_column_ids.begin(), _vec_pred_column_ids.end());
-        std::set<ColumnId> non_pred_set(_non_predicate_columns.begin(),
-                                        _non_predicate_columns.end());
-
-        for (int i = 0; i < _schema.num_column_ids(); i++) {
-            auto cid = _schema.column_id(i);
-            if (pred_id_set.find(cid) != pred_id_set.end()) {
-                _first_read_column_ids.push_back(cid);
-            } else if (non_pred_set.find(cid) != non_pred_set.end()) {
+    } else {
+        if (_is_need_vec_eval || _is_need_short_eval) {
+            // TODO To refactor, because we suppose lazy materialization is better performance.
+            // pred exits, but we can eliminate lazy materialization
+            // insert pred/non-pred cid to first read columns
+            std::set<ColumnId> pred_id_set;
+            pred_id_set.insert(_short_cir_pred_column_ids.begin(),
+                               _short_cir_pred_column_ids.end());
+            pred_id_set.insert(_vec_pred_column_ids.begin(), _vec_pred_column_ids.end());
+            std::set<ColumnId> non_pred_set(_non_predicate_columns.begin(),
+                                            _non_predicate_columns.end());
+
+            for (int i = 0; i < _schema.num_column_ids(); i++) {
+                auto cid = _schema.column_id(i);
+                if (pred_id_set.find(cid) != pred_id_set.end()) {
+                    _first_read_column_ids.push_back(cid);
+                } else if (non_pred_set.find(cid) != non_pred_set.end()) {
+                    _first_read_column_ids.push_back(cid);
+                    // when _lazy_materialization_read = false, non-predicate column should also be filtered by sel idx, so we regard it as pred columns
+                    _is_pred_column[cid] = true;
+                }
+            }
+        } else if (_is_need_expr_eval) {
+            DCHECK(!_is_need_vec_eval && !_is_need_short_eval);
+            for (auto cid : _remaining_expr_columns) {
                 _first_read_column_ids.push_back(cid);
-                // when _lazy_materialization_read = false, non-predicate column should also be filtered by sel idx, so we regard it as pred columns
-                _is_pred_column[cid] = true;
             }

Review Comment:
   in this case, should clear _second_read_column_ids?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115208124


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -181,7 +190,10 @@ Status SegmentIterator::init(const StorageReadOptions& opts) {
         _output_columns = *(opts.output_columns);
     }
 
-    _remaining_vconjunct_root = opts.remaining_vconjunct_root;
+    _remaining_vconjunct_ctx = opts.remaining_vconjunct_ctx;

Review Comment:
   `_remaining_vconjunct_root` = `_remaining_vconjunct_ctx->root()`
   Got `_remaining_vconjunct_root` in `SegmentIterator::init`. compatible before



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1451535660

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1125981881


##########
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java:
##########
@@ -642,6 +644,9 @@ public class SessionVariable implements Serializable, Writable {
     @VariableMgr.VarAttr(name = ENABLE_FUNCTION_PUSHDOWN)
     public boolean enableFunctionPushdown = true;
 
+    @VariableMgr.VarAttr(name = ENABLE_COMMON_EXPR_PUSHDOWN, fuzzy = true)
+    public boolean enableCommonExprPushdown = true;

Review Comment:
   should add fuzzy code logic in initFuzzyModeVariables method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1461756367

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1457287840

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1460742913

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1383389282

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 35.77 seconds
    load time: 491 seconds
    storage size: 17122669491 Bytes
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230116024129_clickbench_pr_80700.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1125974726


##########
be/src/vec/exec/scan/vscan_node.cpp:
##########
@@ -458,7 +461,11 @@ Status VScanNode::_normalize_conjuncts() {
             RETURN_IF_ERROR(_normalize_predicate((*_vconjunct_ctx_ptr)->root(), &new_root));
             if (new_root) {
                 (*_vconjunct_ctx_ptr)->set_root(new_root);
-            } else {
+                if (_should_push_down_common_expr()) {
+                    _common_vexpr_ctxs_pushdown = *_vconjunct_ctx_ptr;

Review Comment:
   maybe memory leak here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1459991362

   ./run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1437708130

   run arm


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1116608487


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1160,58 +1189,86 @@ void SegmentIterator::_vec_init_lazy_materialization() {
         _is_need_short_eval = true;
     }
 
-    // Step 2: check non-predicate read costs to determine whether need lazy materialization
-    // fill _non_predicate_columns.
-    // After some optimization, we suppose lazy materialization is better performance.
+    // make _schema_block_id_map
+    _schema_block_id_map.resize(_schema.columns().size());
+    for (int i = 0; i < _schema.num_column_ids(); i++) {
+        auto cid = _schema.column_id(i);
+        _schema_block_id_map[cid] = i;
+    }
+
+    // Step2: extract columns that can execute expr context
+    _is_remaining_expr_column.resize(_schema.columns().size(), false);
+    if (_enable_remaining_expr_pushdown && _remaining_vconjunct_root != nullptr) {
+        _extract_remaining_expr_columns(_remaining_vconjunct_root);
+        if (!_remaining_expr_columns.empty()) {
+            _is_need_expr_eval = true;
+            for (auto cid : _schema.column_ids()) {
+                // pred column also needs to be filtered by expr
+                if (_is_remaining_expr_column[cid] || _is_pred_column[cid]) {
+                    auto loc = _schema_block_id_map[cid];
+                    _columns_to_filter.push_back(loc);
+                }
+            }
+        }
+    }
+
+    // Step 3: fill non predicate columns and second read columns
     if (_schema.column_ids().size() > pred_column_ids.size()) {
         for (auto cid : _schema.column_ids()) {
             if (!_is_pred_column[cid]) {
-                _non_predicate_columns.push_back(cid);
+                if (!_is_remaining_expr_column[cid]) {
+                    _non_predicate_columns.push_back(cid);
+                } else {
+                    _second_read_column_ids.push_back(cid);
+                }
                 if (_is_need_vec_eval || _is_need_short_eval) {
                     _lazy_materialization_read = true;
                 }
             }
         }
     }
 
-    // Step 3: fill column ids for read and output
+    // Step 4: fill first read columns
     if (_lazy_materialization_read) {
         // insert pred cid to first_read_columns
         for (auto cid : pred_column_ids) {
             _first_read_column_ids.push_back(cid);
         }
-    } else if (!_is_need_vec_eval &&
-               !_is_need_short_eval) { // no pred exists, just read and output column
+    } else if (!_is_need_vec_eval && !_is_need_short_eval &&
+               !_is_need_expr_eval) { // no pred exists, just read and output column
         for (int i = 0; i < _schema.num_column_ids(); i++) {
             auto cid = _schema.column_id(i);
             _first_read_column_ids.push_back(cid);
         }
-    } else { // pred exits, but we can eliminate lazy materialization
-        // insert pred/non-pred cid to first read columns
-        std::set<ColumnId> pred_id_set;
-        pred_id_set.insert(_short_cir_pred_column_ids.begin(), _short_cir_pred_column_ids.end());
-        pred_id_set.insert(_vec_pred_column_ids.begin(), _vec_pred_column_ids.end());
-        std::set<ColumnId> non_pred_set(_non_predicate_columns.begin(),
-                                        _non_predicate_columns.end());
-
-        for (int i = 0; i < _schema.num_column_ids(); i++) {
-            auto cid = _schema.column_id(i);
-            if (pred_id_set.find(cid) != pred_id_set.end()) {
-                _first_read_column_ids.push_back(cid);
-            } else if (non_pred_set.find(cid) != non_pred_set.end()) {
+    } else {
+        if (_is_need_vec_eval || _is_need_short_eval) {
+            // TODO To refactor, because we suppose lazy materialization is better performance.
+            // pred exits, but we can eliminate lazy materialization
+            // insert pred/non-pred cid to first read columns
+            std::set<ColumnId> pred_id_set;
+            pred_id_set.insert(_short_cir_pred_column_ids.begin(),
+                               _short_cir_pred_column_ids.end());
+            pred_id_set.insert(_vec_pred_column_ids.begin(), _vec_pred_column_ids.end());
+            std::set<ColumnId> non_pred_set(_non_predicate_columns.begin(),
+                                            _non_predicate_columns.end());
+
+            for (int i = 0; i < _schema.num_column_ids(); i++) {
+                auto cid = _schema.column_id(i);
+                if (pred_id_set.find(cid) != pred_id_set.end()) {
+                    _first_read_column_ids.push_back(cid);
+                } else if (non_pred_set.find(cid) != non_pred_set.end()) {
+                    _first_read_column_ids.push_back(cid);
+                    // when _lazy_materialization_read = false, non-predicate column should also be filtered by sel idx, so we regard it as pred columns
+                    _is_pred_column[cid] = true;
+                }
+            }
+        } else if (_is_need_expr_eval) {
+            DCHECK(!_is_need_vec_eval && !_is_need_short_eval);
+            for (auto cid : _remaining_expr_columns) {
                 _first_read_column_ids.push_back(cid);
-                // when _lazy_materialization_read = false, non-predicate column should also be filtered by sel idx, so we regard it as pred columns
-                _is_pred_column[cid] = true;
             }

Review Comment:
   In this case `_second_read_column_ids` is empty and there is no common expr pushdown.
   Added a comment and check



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115295379


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1160,58 +1189,86 @@ void SegmentIterator::_vec_init_lazy_materialization() {
         _is_need_short_eval = true;
     }
 
-    // Step 2: check non-predicate read costs to determine whether need lazy materialization
-    // fill _non_predicate_columns.
-    // After some optimization, we suppose lazy materialization is better performance.
+    // make _schema_block_id_map
+    _schema_block_id_map.resize(_schema.columns().size());
+    for (int i = 0; i < _schema.num_column_ids(); i++) {
+        auto cid = _schema.column_id(i);
+        _schema_block_id_map[cid] = i;
+    }
+
+    // Step2: extract columns that can execute expr context
+    _is_remaining_expr_column.resize(_schema.columns().size(), false);
+    if (_enable_remaining_expr_pushdown && _remaining_vconjunct_root != nullptr) {
+        _extract_remaining_expr_columns(_remaining_vconjunct_root);
+        if (!_remaining_expr_columns.empty()) {
+            _is_need_expr_eval = true;
+            for (auto cid : _schema.column_ids()) {
+                // pred column also needs to be filtered by expr
+                if (_is_remaining_expr_column[cid] || _is_pred_column[cid]) {
+                    auto loc = _schema_block_id_map[cid];
+                    _columns_to_filter.push_back(loc);
+                }
+            }
+        }
+    }
+
+    // Step 3: fill non predicate columns and second read columns
     if (_schema.column_ids().size() > pred_column_ids.size()) {
         for (auto cid : _schema.column_ids()) {
             if (!_is_pred_column[cid]) {
-                _non_predicate_columns.push_back(cid);
+                if (!_is_remaining_expr_column[cid]) {
+                    _non_predicate_columns.push_back(cid);
+                } else {
+                    _second_read_column_ids.push_back(cid);
+                }
                 if (_is_need_vec_eval || _is_need_short_eval) {
                     _lazy_materialization_read = true;
                 }
             }
         }
     }
 

Review Comment:
   there are  3 if else here. I think the code is right, but it too hard to think it correctly. 
   Better add a comment here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115199778


##########
be/src/vec/core/block.h:
##########
@@ -75,6 +75,7 @@ class Block {
     IndexByName index_by_name;
     std::vector<bool> row_same_bit;
 

Review Comment:
   fixed, add `set_effective_col` comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1445706069

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1069466472


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1107,7 +1140,7 @@ void SegmentIterator::_init_current_block(
             i >= block->columns()) { //todo(wb) maybe we can release it after output block
             current_columns[cid]->clear();
         } else { // non-predicate column
-            current_columns[cid] = std::move(*block->get_by_position(i).column).mutate();
+            current_columns[cid] = std::move(*block->get_by_position(i).column).assume_mutable();

Review Comment:
   warning: std::move of the const expression has no effect; remove std::move() [performance-move-const-arg]
   
   ```suggestion
               current_columns[cid] = *block->get_by_position(i).column.assume_mutable();
   ```
   



##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1414,6 +1481,115 @@
     return Status::OK();
 }
 
+uint16_t SegmentIterator::_execute_remaining_expr(uint16_t* sel_rowid_idx, uint16_t selected_size,
+                                                  vectorized::Block* block) {
+    if (_remaining_vconjunct_root != nullptr && block->rows() != 0) {
+        size_t prev_columns = block->columns();
+        int result_column_id = -1;
+        RETURN_IF_ERROR(_remaining_vconjunct_ctx->execute(block, &result_column_id));
+        vectorized::ColumnPtr filter_column = block->get_by_position(result_column_id).column;
+        if (auto* nullable_column =
+                    vectorized::check_and_get_column<vectorized::ColumnNullable>(*filter_column)) {
+            vectorized::ColumnPtr nested_column = nullable_column->get_nested_column_ptr();
+
+            vectorized::MutableColumnPtr mutable_holder =
+                    nested_column->use_count() == 1
+                            ? nested_column->assume_mutable()
+                            : nested_column->clone_resized(nested_column->size());
+
+            vectorized::ColumnUInt8* concrete_column =
+                    typeid_cast<vectorized::ColumnUInt8*>(mutable_holder.get());
+            if (!concrete_column) {
+                return Status::InvalidArgument(
+                        "Illegal type {} of column for filter. Must be UInt8 or Nullable(UInt8).",
+                        filter_column->get_name());
+            }
+            auto* __restrict null_map = nullable_column->get_null_map_data().data();
+            vectorized::IColumn::Filter& filter = concrete_column->get_data();
+            auto* __restrict filter_data = filter.data();
+
+            const size_t size = filter.size();
+            for (size_t i = 0; i < size; ++i) {
+                filter_data[i] &= !null_map[i];
+            }
+
+            selected_size = _evaluate_remaining_expr_filter(sel_rowid_idx, selected_size, filter);
+            vectorized::Block::filter_block_internal(block, _columns_to_filter, filter);
+        } else if (auto* const_column = vectorized::check_and_get_column<vectorized::ColumnConst>(
+                           *filter_column)) {
+            bool ret = const_column->get_bool(0);
+            if (!ret) {
+                selected_size = 0;
+                for (auto& col : _columns_to_filter) {
+                    std::move(*block->get_by_position(col).column).assume_mutable()->clear();

Review Comment:
   warning: std::move of the const expression has no effect; remove std::move() [performance-move-const-arg]
   
   ```suggestion
                       *block->get_by_position(col).column.assume_mutable()->clear();
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115207019


##########
be/src/vec/olap/vcollect_iterator.cpp:
##########
@@ -285,11 +285,6 @@ Status VCollectIterator::_topn_next(Block* block) {
 
             auto col_name = block->get_names()[first_sort_column_idx];
 
-            // filter block
-            RETURN_IF_ERROR(VExprContext::filter_block(

Review Comment:
   Checked with them, can be replaced



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1454967771

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1126835793


##########
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java:
##########
@@ -642,6 +644,9 @@ public class SessionVariable implements Serializable, Writable {
     @VariableMgr.VarAttr(name = ENABLE_FUNCTION_PUSHDOWN)
     public boolean enableFunctionPushdown = true;
 
+    @VariableMgr.VarAttr(name = ENABLE_COMMON_EXPR_PUSHDOWN, fuzzy = true)
+    public boolean enableCommonExprPushdown = true;

Review Comment:
   done



##########
be/src/vec/exec/scan/vscan_node.cpp:
##########
@@ -458,7 +461,11 @@ Status VScanNode::_normalize_conjuncts() {
             RETURN_IF_ERROR(_normalize_predicate((*_vconjunct_ctx_ptr)->root(), &new_root));
             if (new_root) {
                 (*_vconjunct_ctx_ptr)->set_root(new_root);
-            } else {
+                if (_should_push_down_common_expr()) {
+                    _common_vexpr_ctxs_pushdown = *_vconjunct_ctx_ptr;

Review Comment:
   fix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1126011291


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1196,58 +1220,91 @@ void SegmentIterator::_vec_init_lazy_materialization() {
         _is_need_short_eval = true;
     }
 
-    // Step 2: check non-predicate read costs to determine whether need lazy materialization
-    // fill _non_predicate_columns.
-    // After some optimization, we suppose lazy materialization is better performance.
+    // make _schema_block_id_map
+    _schema_block_id_map.resize(_schema.columns().size());
+    for (int i = 0; i < _schema.num_column_ids(); i++) {
+        auto cid = _schema.column_id(i);
+        _schema_block_id_map[cid] = i;
+    }
+
+    // Step2: extract columns that can execute expr context
+    _is_common_expr_column.resize(_schema.columns().size(), false);
+    if (_enable_common_expr_pushdown && _remaining_vconjunct_root != nullptr) {
+        _extract_common_expr_columns(_remaining_vconjunct_root);
+        if (!_common_expr_columns.empty()) {
+            _is_need_expr_eval = true;
+            for (auto cid : _schema.column_ids()) {
+                // pred column also needs to be filtered by expr
+                if (_is_common_expr_column[cid] || _is_pred_column[cid]) {
+                    auto loc = _schema_block_id_map[cid];
+                    _columns_to_filter.push_back(loc);
+                }
+            }
+        }
+    }
+
+    // Step 3: fill non predicate columns and second read column
+    // if _schema columns size equal to pred_column_ids size, lazy_materialization_read is false,
+    // all columns are lazy materialization columns without non predicte column.
+    // If common expr pushdown exists, and expr column is not contained in lazy materialization columns,
+    // add to second read column, which will be read after lazy materialization
     if (_schema.column_ids().size() > pred_column_ids.size()) {
         for (auto cid : _schema.column_ids()) {
             if (!_is_pred_column[cid]) {
-                _non_predicate_columns.push_back(cid);
                 if (_is_need_vec_eval || _is_need_short_eval) {
                     _lazy_materialization_read = true;
                 }
+                if (!_is_common_expr_column[cid]) {
+                    _non_predicate_columns.push_back(cid);
+                } else {
+                    _second_read_column_ids.push_back(cid);
+                }
             }
         }
     }
 
-    // Step 3: fill column ids for read and output
+    // Step 4: fill first read columns
     if (_lazy_materialization_read) {
         // insert pred cid to first_read_columns
         for (auto cid : pred_column_ids) {
             _first_read_column_ids.push_back(cid);
         }
-    } else if (!_is_need_vec_eval &&
-               !_is_need_short_eval) { // no pred exists, just read and output column
+    } else if (!_is_need_vec_eval && !_is_need_short_eval &&
+               !_is_need_expr_eval) { // no pred exists, just read and output column
         for (int i = 0; i < _schema.num_column_ids(); i++) {
             auto cid = _schema.column_id(i);
             _first_read_column_ids.push_back(cid);
         }
-    } else { // pred exits, but we can eliminate lazy materialization
-        // insert pred/non-pred cid to first read columns
-        std::set<ColumnId> pred_id_set;
-        pred_id_set.insert(_short_cir_pred_column_ids.begin(), _short_cir_pred_column_ids.end());
-        pred_id_set.insert(_vec_pred_column_ids.begin(), _vec_pred_column_ids.end());
-        std::set<ColumnId> non_pred_set(_non_predicate_columns.begin(),
-                                        _non_predicate_columns.end());
-
-        for (int i = 0; i < _schema.num_column_ids(); i++) {
-            auto cid = _schema.column_id(i);
-            if (pred_id_set.find(cid) != pred_id_set.end()) {
-                _first_read_column_ids.push_back(cid);
-            } else if (non_pred_set.find(cid) != non_pred_set.end()) {
+    } else {
+        if (_is_need_vec_eval || _is_need_short_eval) {
+            // TODO To refactor, because we suppose lazy materialization is better performance.
+            // pred exits, but we can eliminate lazy materialization
+            // insert pred/non-pred cid to first read columns
+            std::set<ColumnId> pred_id_set;
+            pred_id_set.insert(_short_cir_pred_column_ids.begin(),
+                               _short_cir_pred_column_ids.end());
+            pred_id_set.insert(_vec_pred_column_ids.begin(), _vec_pred_column_ids.end());
+            std::set<ColumnId> non_pred_set(_non_predicate_columns.begin(),
+                                            _non_predicate_columns.end());
+
+            // _second_read_column_ids must be empty. Otherwise _lazy_materialization_read must not false.

Review Comment:
   ADD CHECK(!_second_read_column_ids.empty())



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1457287770

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1116582204


##########
be/src/vec/exec/scan/vscanner.cpp:
##########
@@ -67,12 +67,12 @@ Status VScanner::get_block(RuntimeState* state, Block* block, bool* eof) {
             }
 
             // 2. Filter the output block finally.
-            {
+            if (!_enable_remaining_expr_pushdown) {
                 SCOPED_TIMER(_parent->_filter_timer);

Review Comment:
   Copy `_vconjunct_ctx_ptr` to `_common_vexpr_ctxs_pushdown` and push down, mark `_vconjunct_ctx_ptr` as stale



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1459024880

   ./run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1461138128

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1453261945

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1437968857

   run clickbench


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115143898


##########
be/src/vec/olap/vcollect_iterator.cpp:
##########
@@ -285,11 +285,6 @@ Status VCollectIterator::_topn_next(Block* block) {
 
             auto col_name = block->get_names()[first_sort_column_idx];
 
-            // filter block
-            RETURN_IF_ERROR(VExprContext::filter_block(

Review Comment:
   Do you check this with hangyu? it is useless?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115142017


##########
be/src/vec/core/block.h:
##########
@@ -75,6 +75,7 @@ class Block {
     IndexByName index_by_name;
     std::vector<bool> row_same_bit;
 

Review Comment:
   Add some comment here to describe why need a _effective_col



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115188291


##########
be/src/vec/exec/scan/vscanner.cpp:
##########
@@ -67,12 +67,12 @@ Status VScanner::get_block(RuntimeState* state, Block* block, bool* eof) {
             }
 
             // 2. Filter the output block finally.
-            {
+            if (!_enable_remaining_expr_pushdown) {
                 SCOPED_TIMER(_parent->_filter_timer);

Review Comment:
   Maybe a problem here. Runtime filter comes after scanner start. Then runtime filter will never take affect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115208124


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -181,7 +190,10 @@ Status SegmentIterator::init(const StorageReadOptions& opts) {
         _output_columns = *(opts.output_columns);
     }
 
-    _remaining_vconjunct_root = opts.remaining_vconjunct_root;
+    _remaining_vconjunct_ctx = opts.remaining_vconjunct_ctx;

Review Comment:
   `_remaining_vconjunct_root` = `_remaining_vconjunct_ctx->root()`
   Compatible with previously pushed down expr on the index.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1101401156


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1452,6 +1536,119 @@ Status SegmentIterator::next_batch(vectorized::Block* block) {
     return Status::OK();
 }
 
+uint16_t SegmentIterator::_execute_remaining_expr(uint16_t* sel_rowid_idx, uint16_t selected_size,
+                                                  vectorized::Block* block) {
+    SCOPED_RAW_TIMER(&_opts.stats->expr_filter_ns);
+    DCHECK(_remaining_vconjunct_root != nullptr);
+    DCHECK(block->rows() != 0);
+    size_t prev_columns = block->columns();
+    int result_column_id = -1;
+    RETURN_IF_ERROR(_remaining_vconjunct_ctx->execute(block, &result_column_id));
+    vectorized::ColumnPtr filter_column = block->get_by_position(result_column_id).column;
+    if (auto* nullable_column =
+                vectorized::check_and_get_column<vectorized::ColumnNullable>(*filter_column)) {
+        vectorized::ColumnPtr nested_column = nullable_column->get_nested_column_ptr();
+
+        vectorized::MutableColumnPtr mutable_holder =
+                nested_column->use_count() == 1
+                        ? nested_column->assume_mutable()
+                        : nested_column->clone_resized(nested_column->size());
+
+        vectorized::ColumnUInt8* concrete_column =
+                typeid_cast<vectorized::ColumnUInt8*>(mutable_holder.get());
+        if (!concrete_column) {
+            return Status::InvalidArgument(
+                    "Illegal type {} of column for filter. Must be UInt8 or Nullable(UInt8).",
+                    filter_column->get_name());
+        }
+        auto* __restrict null_map = nullable_column->get_null_map_data().data();
+        vectorized::IColumn::Filter& filter = concrete_column->get_data();
+        auto* __restrict filter_data = filter.data();
+
+        const size_t size = filter.size();
+        for (size_t i = 0; i < size; ++i) {
+            filter_data[i] &= !null_map[i];
+        }
+
+        selected_size = _evaluate_remaining_expr_filter(sel_rowid_idx, selected_size, filter);
+        vectorized::Block::filter_block_internal(block, _columns_to_filter, filter);
+    } else if (auto* const_column = vectorized::check_and_get_column<vectorized::ColumnConst>(
+                        *filter_column)) {
+        bool ret = const_column->get_bool(0);
+        if (!ret) {
+            for (auto& col : _columns_to_filter) {
+                std::move(*block->get_by_position(col).column).assume_mutable()->clear();

Review Comment:
   warning: std::move of the const expression has no effect; remove std::move() [performance-move-const-arg]
   
   ```suggestion
                   *block->get_by_position(col).column.assume_mutable()->clear();
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115188859


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -181,7 +190,10 @@ Status SegmentIterator::init(const StorageReadOptions& opts) {
         _output_columns = *(opts.output_columns);
     }
 
-    _remaining_vconjunct_root = opts.remaining_vconjunct_root;
+    _remaining_vconjunct_ctx = opts.remaining_vconjunct_ctx;

Review Comment:
   any difference between _remaining_vconjunct_ctx and _remaining_vconjunct_root?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115207151


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -31,9 +31,18 @@
 #include "util/doris_metrics.h"
 #include "util/key_util.h"
 #include "util/simd/bits.h"
+#include "vec/columns/column.h"
+// #include "vec/columns/column_array.h"
+#include "vec/columns/column_const.h"
+// #include "vec/columns/column_nullable.h"

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1115300380


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1505,14 +1562,16 @@ Status SegmentIterator::next_batch(vectorized::Block* block) {
 
     _current_batch_rows_read = 0;
     uint32_t nrows_read_limit = _opts.block_row_max;
-    if (UNLIKELY(_estimate_row_size)) {
-        // read 100 rows to estimate average row size
+    if (UNLIKELY(_opts.stats->blocks_load == 0)) {

Review Comment:
   Not depend on the stats, add another variable to indicate that this is first read.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1116608487


##########
be/src/olap/rowset/segment_v2/segment_iterator.cpp:
##########
@@ -1160,58 +1189,86 @@ void SegmentIterator::_vec_init_lazy_materialization() {
         _is_need_short_eval = true;
     }
 
-    // Step 2: check non-predicate read costs to determine whether need lazy materialization
-    // fill _non_predicate_columns.
-    // After some optimization, we suppose lazy materialization is better performance.
+    // make _schema_block_id_map
+    _schema_block_id_map.resize(_schema.columns().size());
+    for (int i = 0; i < _schema.num_column_ids(); i++) {
+        auto cid = _schema.column_id(i);
+        _schema_block_id_map[cid] = i;
+    }
+
+    // Step2: extract columns that can execute expr context
+    _is_remaining_expr_column.resize(_schema.columns().size(), false);
+    if (_enable_remaining_expr_pushdown && _remaining_vconjunct_root != nullptr) {
+        _extract_remaining_expr_columns(_remaining_vconjunct_root);
+        if (!_remaining_expr_columns.empty()) {
+            _is_need_expr_eval = true;
+            for (auto cid : _schema.column_ids()) {
+                // pred column also needs to be filtered by expr
+                if (_is_remaining_expr_column[cid] || _is_pred_column[cid]) {
+                    auto loc = _schema_block_id_map[cid];
+                    _columns_to_filter.push_back(loc);
+                }
+            }
+        }
+    }
+
+    // Step 3: fill non predicate columns and second read columns
     if (_schema.column_ids().size() > pred_column_ids.size()) {
         for (auto cid : _schema.column_ids()) {
             if (!_is_pred_column[cid]) {
-                _non_predicate_columns.push_back(cid);
+                if (!_is_remaining_expr_column[cid]) {
+                    _non_predicate_columns.push_back(cid);
+                } else {
+                    _second_read_column_ids.push_back(cid);
+                }
                 if (_is_need_vec_eval || _is_need_short_eval) {
                     _lazy_materialization_read = true;
                 }
             }
         }
     }
 
-    // Step 3: fill column ids for read and output
+    // Step 4: fill first read columns
     if (_lazy_materialization_read) {
         // insert pred cid to first_read_columns
         for (auto cid : pred_column_ids) {
             _first_read_column_ids.push_back(cid);
         }
-    } else if (!_is_need_vec_eval &&
-               !_is_need_short_eval) { // no pred exists, just read and output column
+    } else if (!_is_need_vec_eval && !_is_need_short_eval &&
+               !_is_need_expr_eval) { // no pred exists, just read and output column
         for (int i = 0; i < _schema.num_column_ids(); i++) {
             auto cid = _schema.column_id(i);
             _first_read_column_ids.push_back(cid);
         }
-    } else { // pred exits, but we can eliminate lazy materialization
-        // insert pred/non-pred cid to first read columns
-        std::set<ColumnId> pred_id_set;
-        pred_id_set.insert(_short_cir_pred_column_ids.begin(), _short_cir_pred_column_ids.end());
-        pred_id_set.insert(_vec_pred_column_ids.begin(), _vec_pred_column_ids.end());
-        std::set<ColumnId> non_pred_set(_non_predicate_columns.begin(),
-                                        _non_predicate_columns.end());
-
-        for (int i = 0; i < _schema.num_column_ids(); i++) {
-            auto cid = _schema.column_id(i);
-            if (pred_id_set.find(cid) != pred_id_set.end()) {
-                _first_read_column_ids.push_back(cid);
-            } else if (non_pred_set.find(cid) != non_pred_set.end()) {
+    } else {
+        if (_is_need_vec_eval || _is_need_short_eval) {
+            // TODO To refactor, because we suppose lazy materialization is better performance.
+            // pred exits, but we can eliminate lazy materialization
+            // insert pred/non-pred cid to first read columns
+            std::set<ColumnId> pred_id_set;
+            pred_id_set.insert(_short_cir_pred_column_ids.begin(),
+                               _short_cir_pred_column_ids.end());
+            pred_id_set.insert(_vec_pred_column_ids.begin(), _vec_pred_column_ids.end());
+            std::set<ColumnId> non_pred_set(_non_predicate_columns.begin(),
+                                            _non_predicate_columns.end());
+
+            for (int i = 0; i < _schema.num_column_ids(); i++) {
+                auto cid = _schema.column_id(i);
+                if (pred_id_set.find(cid) != pred_id_set.end()) {
+                    _first_read_column_ids.push_back(cid);
+                } else if (non_pred_set.find(cid) != non_pred_set.end()) {
+                    _first_read_column_ids.push_back(cid);
+                    // when _lazy_materialization_read = false, non-predicate column should also be filtered by sel idx, so we regard it as pred columns
+                    _is_pred_column[cid] = true;
+                }
+            }
+        } else if (_is_need_expr_eval) {
+            DCHECK(!_is_need_vec_eval && !_is_need_short_eval);
+            for (auto cid : _remaining_expr_columns) {
                 _first_read_column_ids.push_back(cid);
-                // when _lazy_materialization_read = false, non-predicate column should also be filtered by sel idx, so we regard it as pred columns
-                _is_pred_column[cid] = true;
             }

Review Comment:
   In this case `_second_read_column_ids` must be empty.
   Otherwise `_lazy_materialization_read` must not false and will enter the first `if`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1437706038

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1437740584

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1122787631


##########
be/src/vec/exec/scan/vscan_node.cpp:
##########
@@ -458,7 +461,15 @@ Status VScanNode::_normalize_conjuncts() {
             RETURN_IF_ERROR(_normalize_predicate((*_vconjunct_ctx_ptr)->root(), &new_root));
             if (new_root) {
                 (*_vconjunct_ctx_ptr)->set_root(new_root);
-            } else {
+                if (_should_push_down_common_expr()) {
+                    // Copy _vconjunct_ctx_ptr to _common_vexpr_ctxs_pushdown and push down, mark _vconjunct_ctx_ptr as stale.

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei commented on code in PR #15917:
URL: https://github.com/apache/doris/pull/15917#discussion_r1125980962


##########
be/src/vec/core/block.cpp:
##########
@@ -321,13 +322,16 @@ void Block::check_number_of_rows(bool allow_null_columns) const {
         if (rows == -1) {
             rows = size;
         } else if (rows != size) {
-            LOG(FATAL) << fmt::format("Sizes of columns doesn't match: {}:{},{}:{}",
-                                      data.front().name, rows, elem.name, size);
+            LOG(FATAL) << fmt::format("Sizes of columns doesn't match: {}:{},{}:{}, col size: {}",
+                                      data.front().name, rows, elem.name, size, each_col_size());
         }
     }
 }
 
 size_t Block::rows() const {
+    if (_effective_col != INT_MIN) {

Review Comment:
   use column const during filter block. and then replace column after filter block.
   Then do not need add this effective col any more.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xinyiZzz commented on pull request #15917: [improvement](scan) Support pushdown execute expr ctx

Posted by "xinyiZzz (via GitHub)" <gi...@apache.org>.
xinyiZzz commented on PR #15917:
URL: https://github.com/apache/doris/pull/15917#issuecomment-1456900944

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org