You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/06/23 04:11:31 UTC

[GitHub] [doris] compasses opened a new pull request, #10355: Improve performance like/not like filter through pushdown function to storage engine

compasses opened a new pull request, #10355:
URL: https://github.com/apache/doris/pull/10355

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   Describe the overview of changes.
   In order to improve  to improve the performance of like/not like string matching, this PR would pushdown the function to storage engine. Test shows it can get 2x-3x performance gain.
   
   ```
   select sum(lo_quantity),sum(lo_extendedprice),lo_orderdate from lineorder where lo_orderpriority like '%MED%' group by lo_orderdate order by lo_orderdate;
   
   vectorized: 
   before: ~0.6s,  after: ~0.3s
   
   not vectorized:
   before: ~3s,  after: ~1.2s
   
   data filtered in segment reader:
   ![image](https://user-images.githubusercontent.com/10161171/175207195-b01aa536-f346-42dd-99c5-db395cb78f73.png)
   
   
   ```
   
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (No)
   2. Has unit tests been added: (No Need)
   3. Has document been added or modified: (No Need)
   4. Does it need to update dependencies: (No)
   5. Are there any changes that cannot be rolled back: (No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 commented on a diff in pull request #10355: Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
Gabriel39 commented on code in PR #10355:
URL: https://github.com/apache/doris/pull/10355#discussion_r904728074


##########
be/src/vec/exec/volap_scan_node.cpp:
##########
@@ -729,6 +729,72 @@ Status VOlapScanNode::build_olap_filters() {
     return Status::OK();
 }
 
+Status VOlapScanNode::build_function_filters() {
+    for (int conj_idx = 0; conj_idx < _conjunct_ctxs.size(); ++conj_idx) {
+        ExprContext* ex_ctx = _conjunct_ctxs[conj_idx];
+        Expr* fn_expr = ex_ctx->root();
+        bool opposite = false;
+
+        if (TExprNodeType::COMPOUND_PRED == fn_expr->node_type())
+        {
+            if (TExprOpcode::COMPOUND_NOT == fn_expr->op())

Review Comment:
   combine these two conditions into one



##########
be/src/vec/exec/volap_scan_node.cpp:
##########
@@ -772,17 +838,22 @@ Status VOlapScanNode::start_scan(RuntimeState* state) {
     }
 
     VLOG_CRITICAL << "BuildOlapFilters";
-    // 3. Using ColumnValueRange to Build StorageEngine filters
+    // 3.1 Using ColumnValueRange to Build StorageEngine filters
     RETURN_IF_ERROR(build_olap_filters());
+    // 3.2 Function pushdown
+    if (config::enable_function_pushdown)
+        RETURN_IF_ERROR(build_function_filters());
 
     VLOG_CRITICAL << "BuildScanKey";
     // 4. Using `Key Column`'s ColumnValueRange to split ScanRange to several `Sub ScanRange`
     RETURN_IF_ERROR(build_scan_key());
 
     VLOG_CRITICAL << "Filter idle conjuncts";
-    // 5. Filter idle conjunct which already trans to olap filters
+    // 5.1 Filter idle conjunct which already trans to olap filters
     // this must be after build_scan_key, it will free the StringValue memory
     remove_pushed_conjuncts(state);
+    // 5.2 move the pushed function context
+    move_pushed_func_conjuncts(state);

Review Comment:
   why use a new function here? I think we should just use `like` function as a new predicate type which could be pushed down and just reuse `remove_pushed_conjuncts` to do this job.



##########
be/src/vec/exec/volap_scan_node.cpp:
##########
@@ -729,6 +729,72 @@ Status VOlapScanNode::build_olap_filters() {
     return Status::OK();
 }
 
+Status VOlapScanNode::build_function_filters() {
+    for (int conj_idx = 0; conj_idx < _conjunct_ctxs.size(); ++conj_idx) {
+        ExprContext* ex_ctx = _conjunct_ctxs[conj_idx];
+        Expr* fn_expr = ex_ctx->root();
+        bool opposite = false;
+
+        if (TExprNodeType::COMPOUND_PRED == fn_expr->node_type())
+        {
+            if (TExprOpcode::COMPOUND_NOT == fn_expr->op())
+            {
+                fn_expr = fn_expr->get_child(0);
+                opposite = true;
+            }
+        }
+
+        if (TExprNodeType::FUNCTION_CALL == fn_expr->node_type())
+        {
+            // currently only support like / not like
+            if ("like" == fn_expr->fn().name.function_name)

Review Comment:
   ditto



##########
be/src/vec/exec/volap_scan_node.cpp:
##########
@@ -729,6 +729,72 @@ Status VOlapScanNode::build_olap_filters() {
     return Status::OK();
 }
 
+Status VOlapScanNode::build_function_filters() {
+    for (int conj_idx = 0; conj_idx < _conjunct_ctxs.size(); ++conj_idx) {
+        ExprContext* ex_ctx = _conjunct_ctxs[conj_idx];
+        Expr* fn_expr = ex_ctx->root();
+        bool opposite = false;
+
+        if (TExprNodeType::COMPOUND_PRED == fn_expr->node_type())
+        {
+            if (TExprOpcode::COMPOUND_NOT == fn_expr->op())
+            {
+                fn_expr = fn_expr->get_child(0);
+                opposite = true;
+            }
+        }
+
+        if (TExprNodeType::FUNCTION_CALL == fn_expr->node_type())

Review Comment:
   Maybe you could make this if-block more concise.
   1. check if we could apply this rule
   2. compute child_idx for slot_ref
   3. use child_idx as slotref and 1 - child_idx as stringliteral.
   
   This process is very similar to other predicates



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei merged pull request #10355: [Optimize] Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
yiguolei merged PR #10355:
URL: https://github.com/apache/doris/pull/10355


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] compasses commented on a diff in pull request #10355: [Optimize] Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
compasses commented on code in PR #10355:
URL: https://github.com/apache/doris/pull/10355#discussion_r915548760


##########
be/src/exprs/function_filter.h:
##########
@@ -0,0 +1,43 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef DORIS_BE_SRC_FUNCTION_FILTER_H

Review Comment:
   Ok done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10355: [Optimize] Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10355:
URL: https://github.com/apache/doris/pull/10355#discussion_r919724363


##########
be/src/common/config.h:
##########
@@ -763,6 +763,8 @@ CONF_Int32(quick_compaction_batch_size, "10");
 // do compaction min rowsets
 CONF_Int32(quick_compaction_min_rowsets, "10");
 
+CONF_mBool(enable_function_pushdown, "true");

Review Comment:
   Not enable it by default, the new feature should be an experimental feature and open it by default when 1.3 is released.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] compasses commented on pull request #10355: Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
compasses commented on PR #10355:
URL: https://github.com/apache/doris/pull/10355#issuecomment-1166244799

   @mrhhsg Hi, can you help check the changes as below, involved from your commits recently, which will lead to the function pushdown not work in this PR in vectorized mode.
   
   BTW, the pushdown function of like / not like can got 2x performance gain in vectorized execution engine. 
   
   ```
   bool SegmentIterator::_can_evaluated_by_vectorized(ColumnPredicate* predicate) {
       auto cid = predicate->column_id();
       FieldType field_type = _schema.column(cid)->type();
       switch (predicate->type()) {
       case PredicateType::EQ:
       case PredicateType::NE:
       case PredicateType::LE:
       case PredicateType::LT:
       case PredicateType::GE:
       case PredicateType::GT: {
           if (field_type == OLAP_FIELD_TYPE_VARCHAR || field_type == OLAP_FIELD_TYPE_CHAR ||
               field_type == OLAP_FIELD_TYPE_STRING) {
               return config::enable_low_cardinality_optimize &&
                      _column_iterators[cid]->is_all_dict_encoding();
           } else if (field_type == OLAP_FIELD_TYPE_DECIMAL) {
               return false;
           }
           return true;
       }
       default:
           return false;
       }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] compasses commented on a diff in pull request #10355: Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
compasses commented on code in PR #10355:
URL: https://github.com/apache/doris/pull/10355#discussion_r904979836


##########
be/src/vec/exec/volap_scan_node.cpp:
##########
@@ -729,6 +729,72 @@ Status VOlapScanNode::build_olap_filters() {
     return Status::OK();
 }
 
+Status VOlapScanNode::build_function_filters() {
+    for (int conj_idx = 0; conj_idx < _conjunct_ctxs.size(); ++conj_idx) {
+        ExprContext* ex_ctx = _conjunct_ctxs[conj_idx];
+        Expr* fn_expr = ex_ctx->root();
+        bool opposite = false;
+
+        if (TExprNodeType::COMPOUND_PRED == fn_expr->node_type())
+        {
+            if (TExprOpcode::COMPOUND_NOT == fn_expr->op())

Review Comment:
   sure, make sense



##########
be/src/vec/exec/volap_scan_node.cpp:
##########
@@ -729,6 +729,72 @@ Status VOlapScanNode::build_olap_filters() {
     return Status::OK();
 }
 
+Status VOlapScanNode::build_function_filters() {
+    for (int conj_idx = 0; conj_idx < _conjunct_ctxs.size(); ++conj_idx) {
+        ExprContext* ex_ctx = _conjunct_ctxs[conj_idx];
+        Expr* fn_expr = ex_ctx->root();
+        bool opposite = false;
+
+        if (TExprNodeType::COMPOUND_PRED == fn_expr->node_type())
+        {
+            if (TExprOpcode::COMPOUND_NOT == fn_expr->op())
+            {
+                fn_expr = fn_expr->get_child(0);
+                opposite = true;
+            }
+        }
+
+        if (TExprNodeType::FUNCTION_CALL == fn_expr->node_type())
+        {
+            // currently only support like / not like
+            if ("like" == fn_expr->fn().name.function_name)

Review Comment:
   ok
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] mrhhsg commented on pull request #10355: [Optimize] Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
mrhhsg commented on PR #10355:
URL: https://github.com/apache/doris/pull/10355#issuecomment-1166322866

   > @mrhhsg Hi, can you help check the changes as below, involved from your commits recently, which will lead to the function pushdown not work in this PR in vectorized mode.
   > 
   > BTW, the pushdown function of like / not like can get 2x performance gain in vectorized execution engine.
   > 
   > ```
   > bool SegmentIterator::_can_evaluated_by_vectorized(ColumnPredicate* predicate) {
   >     auto cid = predicate->column_id();
   >     FieldType field_type = _schema.column(cid)->type();
   >     switch (predicate->type()) {
   >     case PredicateType::EQ:
   >     case PredicateType::NE:
   >     case PredicateType::LE:
   >     case PredicateType::LT:
   >     case PredicateType::GE:
   >     case PredicateType::GT: {
   >         if (field_type == OLAP_FIELD_TYPE_VARCHAR || field_type == OLAP_FIELD_TYPE_CHAR ||
   >             field_type == OLAP_FIELD_TYPE_STRING) {
   >             return config::enable_low_cardinality_optimize &&
   >                    _column_iterators[cid]->is_all_dict_encoding();
   >         } else if (field_type == OLAP_FIELD_TYPE_DECIMAL) {
   >             return false;
   >         }
   >         return true;
   >     }
   >     default:
   >         return false;
   >     }
   > }
   > ```
   
   @compasses I am not sure, this logic should be just the same as the previous.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10355: [Optimize] Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10355:
URL: https://github.com/apache/doris/pull/10355#discussion_r913741869


##########
be/src/olap/like_column_predicate.h:
##########
@@ -0,0 +1,82 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef DORIS_BE_SRC_OLAP_LIKE_COLUMN_PREDICATE_H

Review Comment:
   use pragma once 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10355: [Optimize] Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10355:
URL: https://github.com/apache/doris/pull/10355#discussion_r919725533


##########
be/src/vec/exec/volap_scan_node.cpp:
##########
@@ -973,29 +1037,31 @@ bool VOlapScanNode::is_key_column(const std::string& key_name) {
 }
 
 void VOlapScanNode::remove_pushed_conjuncts(RuntimeState* state) {
-    if (_pushed_conjuncts_index.empty()) {
+    if (_pushed_conjuncts_index.empty() && _pushed_func_conjuncts_index.empty()) {
         return;
     }
 
     // dispose direct conjunct first
     std::vector<ExprContext*> new_conjunct_ctxs;
     for (int i = 0; i < _direct_conjunct_size; ++i) {
-        if (std::find(_pushed_conjuncts_index.cbegin(), _pushed_conjuncts_index.cend(), i) ==
-            _pushed_conjuncts_index.cend()) {
-            new_conjunct_ctxs.emplace_back(_conjunct_ctxs[i]);
+        if (!_pushed_conjuncts_index.empty() && _pushed_conjuncts_index.count(i)) {

Review Comment:
   In vectorized engine, it is _vconjuncts



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10355: [Optimize] Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10355:
URL: https://github.com/apache/doris/pull/10355#discussion_r919725533


##########
be/src/vec/exec/volap_scan_node.cpp:
##########
@@ -973,29 +1037,31 @@ bool VOlapScanNode::is_key_column(const std::string& key_name) {
 }
 
 void VOlapScanNode::remove_pushed_conjuncts(RuntimeState* state) {
-    if (_pushed_conjuncts_index.empty()) {
+    if (_pushed_conjuncts_index.empty() && _pushed_func_conjuncts_index.empty()) {
         return;
     }
 
     // dispose direct conjunct first
     std::vector<ExprContext*> new_conjunct_ctxs;
     for (int i = 0; i < _direct_conjunct_size; ++i) {
-        if (std::find(_pushed_conjuncts_index.cbegin(), _pushed_conjuncts_index.cend(), i) ==
-            _pushed_conjuncts_index.cend()) {
-            new_conjunct_ctxs.emplace_back(_conjunct_ctxs[i]);
+        if (!_pushed_conjuncts_index.empty() && _pushed_conjuncts_index.count(i)) {

Review Comment:
   In vectorized engine, it is _vconjuncts



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] compasses commented on a diff in pull request #10355: [Optimize] Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
compasses commented on code in PR #10355:
URL: https://github.com/apache/doris/pull/10355#discussion_r922641376


##########
be/src/olap/like_column_predicate.cpp:
##########
@@ -0,0 +1,168 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "udf/udf.h"
+#include "olap/like_column_predicate.h"
+#include "olap/field.h"
+#include "runtime/string_value.hpp"
+#include "runtime/vectorized_row_batch.h"
+
+namespace doris {
+
+LikeColumnPredicate::LikeColumnPredicate(bool opposite, uint32_t column_id, doris_udf::FunctionContext* fn_ctx, doris_udf::StringVal val)
+    : ColumnPredicate(column_id, opposite), _fn_ctx(fn_ctx), pattern(val) {
+    _state = reinterpret_cast<LikePredicateState*>(_fn_ctx->get_function_state(doris_udf::FunctionContext::THREAD_LOCAL));
+}
+
+void LikeColumnPredicate::evaluate(VectorizedRowBatch* batch) const {
+    uint16_t n = batch->size();
+    uint16_t* sel = batch->selected();
+    if (!batch->selected_in_use()) {
+        for (uint16_t i = 0; i != n; ++i) {
+            sel[i] = i;
+        }
+    }
+}
+
+void LikeColumnPredicate::evaluate(ColumnBlock* block, uint16_t* sel, uint16_t* size) const {
+    if (block->is_nullable()) {
+        _base_evaluate<true>(block, sel, size);
+    } else {
+        _base_evaluate<false>(block, sel, size);
+    }
+}
+
+void LikeColumnPredicate::evaluate(vectorized::IColumn& column, uint16_t* sel, uint16_t* size) const {
+    uint16_t new_size = 0;
+
+    if (column.is_nullable()) {
+        auto* nullable_col = vectorized::check_and_get_column<vectorized::ColumnNullable>(column);
+        auto& null_map_data = nullable_col->get_null_map_column().get_data();
+        auto& nested_col = nullable_col->get_nested_column();
+        if (nested_col.is_column_dictionary()) {
+            auto* nested_col_ptr = vectorized::check_and_get_column<vectorized::ColumnDictionary<vectorized::Int32>>(nested_col);
+            auto& data_array = nested_col_ptr->get_data();
+            for (uint16_t i = 0; i < *size; i++) {
+                uint16_t idx = sel[i];
+                sel[new_size] = idx;
+                if (null_map_data[idx]) {
+                    new_size += _opposite;
+                    continue;
+                }
+
+                StringValue cell_value = nested_col_ptr->get_value(data_array[idx]);
+                doris_udf::StringVal target;
+                cell_value.to_string_val(&target);
+                new_size += _opposite ^ ((_state->function)(_fn_ctx, target, pattern).val);
+            }
+        }
+        else
+        {
+            for (uint16_t i = 0; i < *size; i++) {
+                uint16_t idx = sel[i];
+                sel[new_size] = idx;
+                if (null_map_data[idx]) {
+                    new_size += _opposite;
+                    continue;
+                }
+
+                StringRef cell_value = nested_col.get_data_at(idx);
+                doris_udf::StringVal target = cell_value.to_string_val();
+                new_size += _opposite ^ ((_state->function)(_fn_ctx, target, pattern).val);
+            }
+        }
+    } else {
+        if (column.is_column_dictionary()) {
+            auto* nested_col_ptr = vectorized::check_and_get_column<vectorized::ColumnDictionary<vectorized::Int32>>(column);
+            auto& data_array = nested_col_ptr->get_data();
+            for (uint16_t i = 0; i < *size; i++) {
+                uint16_t idx = sel[i];
+                sel[new_size] = idx;
+                StringValue cell_value = nested_col_ptr->get_value(data_array[idx]);
+                doris_udf::StringVal target;
+                cell_value.to_string_val(&target);
+                new_size += _opposite ^ ((_state->function)(_fn_ctx, target, pattern).val);
+            }
+        } else {
+            for (uint16_t i = 0; i < *size; i++) {
+                uint16_t idx = sel[i];
+                sel[new_size] = idx;
+                StringRef cell_value = column.get_data_at(idx);
+                doris_udf::StringVal target = cell_value.to_string_val();
+                new_size += _opposite ^ ((_state->function)(_fn_ctx, target, pattern).val);
+            }
+        }
+    }
+
+    *size = new_size;
+}
+
+void LikeColumnPredicate::evaluate_vec(vectorized::IColumn& column, uint16_t size, bool* flags) const {
+    if (column.is_nullable()) {
+        auto* nullable_col = vectorized::check_and_get_column<vectorized::ColumnNullable>(column);
+        auto& null_map_data = nullable_col->get_null_map_column().get_data();
+        auto& nested_col = nullable_col->get_nested_column();
+        if (nested_col.is_column_dictionary()) {
+            auto* nested_col_ptr = vectorized::check_and_get_column<vectorized::ColumnDictionary<vectorized::Int32>>(nested_col);
+            auto& data_array = nested_col_ptr->get_data();
+            for (uint16_t i = 0; i < size; i++) {
+                if (null_map_data[i]) {
+                    flags[i] = _opposite;
+                    continue;
+                }
+
+                StringValue cell_value = nested_col_ptr->get_value(data_array[i]);
+                doris_udf::StringVal target;
+                cell_value.to_string_val(&target);
+                flags[i] = _opposite ^ ((_state->function)(_fn_ctx, target, pattern).val);
+            }
+        }
+        else
+        {
+            for (uint16_t i = 0; i < size; i++) {
+                if (null_map_data[i]) {
+                    flags[i] = _opposite;
+                    continue;
+                }
+
+                StringRef cell_value = nested_col.get_data_at(i);
+                doris_udf::StringVal target = cell_value.to_string_val();
+                flags[i] = _opposite ^ ((_state->function)(_fn_ctx, target, pattern).val);
+            }
+        }
+    } else {
+        if (column.is_column_dictionary()) {
+            auto* nested_col_ptr = vectorized::check_and_get_column<vectorized::ColumnDictionary<vectorized::Int32>>(column);
+            auto& data_array = nested_col_ptr->get_data();
+            for (uint16_t i = 0; i < size; i++) {
+                StringValue cell_value = nested_col_ptr->get_value(data_array[i]);
+                doris_udf::StringVal target;
+                cell_value.to_string_val(&target);
+                flags[i] = _opposite ^ ((_state->function)(_fn_ctx, target, pattern).val);

Review Comment:
   yes, a minor improve. since in this for loop, there is no branch predication, so I think maybe ok here,  which should has relatively little performance impact.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] compasses commented on a diff in pull request #10355: Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
compasses commented on code in PR #10355:
URL: https://github.com/apache/doris/pull/10355#discussion_r904980824


##########
be/src/vec/exec/volap_scan_node.cpp:
##########
@@ -729,6 +729,72 @@ Status VOlapScanNode::build_olap_filters() {
     return Status::OK();
 }
 
+Status VOlapScanNode::build_function_filters() {
+    for (int conj_idx = 0; conj_idx < _conjunct_ctxs.size(); ++conj_idx) {
+        ExprContext* ex_ctx = _conjunct_ctxs[conj_idx];
+        Expr* fn_expr = ex_ctx->root();
+        bool opposite = false;
+
+        if (TExprNodeType::COMPOUND_PRED == fn_expr->node_type())
+        {
+            if (TExprOpcode::COMPOUND_NOT == fn_expr->op())
+            {
+                fn_expr = fn_expr->get_child(0);
+                opposite = true;
+            }
+        }
+
+        if (TExprNodeType::FUNCTION_CALL == fn_expr->node_type())

Review Comment:
   ok, make sense, will try to make it more clearly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] compasses commented on a diff in pull request #10355: [Optimize] Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
compasses commented on code in PR #10355:
URL: https://github.com/apache/doris/pull/10355#discussion_r915548571


##########
be/src/olap/like_column_predicate.h:
##########
@@ -0,0 +1,82 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef DORIS_BE_SRC_OLAP_LIKE_COLUMN_PREDICATE_H

Review Comment:
   OK done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10355: [Optimize] Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10355:
URL: https://github.com/apache/doris/pull/10355#discussion_r913742274


##########
be/src/exprs/function_filter.h:
##########
@@ -0,0 +1,43 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef DORIS_BE_SRC_FUNCTION_FILTER_H

Review Comment:
   use pragma once



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #10355: [Optimize] Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #10355:
URL: https://github.com/apache/doris/pull/10355#issuecomment-1174989497

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] compasses commented on pull request #10355: [Optimize] Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
compasses commented on PR #10355:
URL: https://github.com/apache/doris/pull/10355#issuecomment-1168139206

   Ok, before the like predicate goes to the  ```_short_cir_eval_predicate``` and now it goes to ```_pre_eval_block_predicate``` , I just make the like predicate support both way.
   
   @Gabriel39 Hi could you help review this PR, and hope it can be merged ASAP. Cause I concern it will lead to conflict to other PR, and I need keep merging to fix them :).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei commented on a diff in pull request #10355: [Optimize] Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #10355:
URL: https://github.com/apache/doris/pull/10355#discussion_r919616378


##########
be/src/olap/like_column_predicate.cpp:
##########
@@ -0,0 +1,168 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "udf/udf.h"
+#include "olap/like_column_predicate.h"
+#include "olap/field.h"
+#include "runtime/string_value.hpp"
+#include "runtime/vectorized_row_batch.h"
+
+namespace doris {
+
+LikeColumnPredicate::LikeColumnPredicate(bool opposite, uint32_t column_id, doris_udf::FunctionContext* fn_ctx, doris_udf::StringVal val)
+    : ColumnPredicate(column_id, opposite), _fn_ctx(fn_ctx), pattern(val) {
+    _state = reinterpret_cast<LikePredicateState*>(_fn_ctx->get_function_state(doris_udf::FunctionContext::THREAD_LOCAL));
+}
+
+void LikeColumnPredicate::evaluate(VectorizedRowBatch* batch) const {
+    uint16_t n = batch->size();
+    uint16_t* sel = batch->selected();
+    if (!batch->selected_in_use()) {
+        for (uint16_t i = 0; i != n; ++i) {
+            sel[i] = i;
+        }
+    }
+}
+
+void LikeColumnPredicate::evaluate(ColumnBlock* block, uint16_t* sel, uint16_t* size) const {
+    if (block->is_nullable()) {
+        _base_evaluate<true>(block, sel, size);
+    } else {
+        _base_evaluate<false>(block, sel, size);
+    }
+}
+
+void LikeColumnPredicate::evaluate(vectorized::IColumn& column, uint16_t* sel, uint16_t* size) const {
+    uint16_t new_size = 0;
+
+    if (column.is_nullable()) {
+        auto* nullable_col = vectorized::check_and_get_column<vectorized::ColumnNullable>(column);
+        auto& null_map_data = nullable_col->get_null_map_column().get_data();
+        auto& nested_col = nullable_col->get_nested_column();
+        if (nested_col.is_column_dictionary()) {
+            auto* nested_col_ptr = vectorized::check_and_get_column<vectorized::ColumnDictionary<vectorized::Int32>>(nested_col);
+            auto& data_array = nested_col_ptr->get_data();
+            for (uint16_t i = 0; i < *size; i++) {
+                uint16_t idx = sel[i];
+                sel[new_size] = idx;
+                if (null_map_data[idx]) {
+                    new_size += _opposite;
+                    continue;
+                }
+
+                StringValue cell_value = nested_col_ptr->get_value(data_array[idx]);
+                doris_udf::StringVal target;
+                cell_value.to_string_val(&target);
+                new_size += _opposite ^ ((_state->function)(_fn_ctx, target, pattern).val);
+            }
+        }
+        else
+        {
+            for (uint16_t i = 0; i < *size; i++) {
+                uint16_t idx = sel[i];
+                sel[new_size] = idx;
+                if (null_map_data[idx]) {
+                    new_size += _opposite;
+                    continue;
+                }
+
+                StringRef cell_value = nested_col.get_data_at(idx);
+                doris_udf::StringVal target = cell_value.to_string_val();
+                new_size += _opposite ^ ((_state->function)(_fn_ctx, target, pattern).val);
+            }
+        }
+    } else {
+        if (column.is_column_dictionary()) {
+            auto* nested_col_ptr = vectorized::check_and_get_column<vectorized::ColumnDictionary<vectorized::Int32>>(column);
+            auto& data_array = nested_col_ptr->get_data();
+            for (uint16_t i = 0; i < *size; i++) {
+                uint16_t idx = sel[i];
+                sel[new_size] = idx;
+                StringValue cell_value = nested_col_ptr->get_value(data_array[idx]);
+                doris_udf::StringVal target;
+                cell_value.to_string_val(&target);
+                new_size += _opposite ^ ((_state->function)(_fn_ctx, target, pattern).val);
+            }
+        } else {
+            for (uint16_t i = 0; i < *size; i++) {
+                uint16_t idx = sel[i];
+                sel[new_size] = idx;
+                StringRef cell_value = column.get_data_at(idx);
+                doris_udf::StringVal target = cell_value.to_string_val();
+                new_size += _opposite ^ ((_state->function)(_fn_ctx, target, pattern).val);
+            }
+        }
+    }
+
+    *size = new_size;
+}
+
+void LikeColumnPredicate::evaluate_vec(vectorized::IColumn& column, uint16_t size, bool* flags) const {
+    if (column.is_nullable()) {
+        auto* nullable_col = vectorized::check_and_get_column<vectorized::ColumnNullable>(column);
+        auto& null_map_data = nullable_col->get_null_map_column().get_data();
+        auto& nested_col = nullable_col->get_nested_column();
+        if (nested_col.is_column_dictionary()) {
+            auto* nested_col_ptr = vectorized::check_and_get_column<vectorized::ColumnDictionary<vectorized::Int32>>(nested_col);
+            auto& data_array = nested_col_ptr->get_data();
+            for (uint16_t i = 0; i < size; i++) {
+                if (null_map_data[i]) {
+                    flags[i] = _opposite;
+                    continue;
+                }
+
+                StringValue cell_value = nested_col_ptr->get_value(data_array[i]);
+                doris_udf::StringVal target;
+                cell_value.to_string_val(&target);
+                flags[i] = _opposite ^ ((_state->function)(_fn_ctx, target, pattern).val);
+            }
+        }
+        else
+        {
+            for (uint16_t i = 0; i < size; i++) {
+                if (null_map_data[i]) {
+                    flags[i] = _opposite;
+                    continue;
+                }
+
+                StringRef cell_value = nested_col.get_data_at(i);
+                doris_udf::StringVal target = cell_value.to_string_val();
+                flags[i] = _opposite ^ ((_state->function)(_fn_ctx, target, pattern).val);
+            }
+        }
+    } else {
+        if (column.is_column_dictionary()) {
+            auto* nested_col_ptr = vectorized::check_and_get_column<vectorized::ColumnDictionary<vectorized::Int32>>(column);
+            auto& data_array = nested_col_ptr->get_data();
+            for (uint16_t i = 0; i < size; i++) {
+                StringValue cell_value = nested_col_ptr->get_value(data_array[i]);
+                doris_udf::StringVal target;
+                cell_value.to_string_val(&target);
+                flags[i] = _opposite ^ ((_state->function)(_fn_ctx, target, pattern).val);

Review Comment:
   better use _opposite as template parameters and use if const expr opposite here. Performance is better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] compasses commented on a diff in pull request #10355: [Optimize] Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
compasses commented on code in PR #10355:
URL: https://github.com/apache/doris/pull/10355#discussion_r922641011


##########
be/src/common/config.h:
##########
@@ -763,6 +763,8 @@ CONF_Int32(quick_compaction_batch_size, "10");
 // do compaction min rowsets
 CONF_Int32(quick_compaction_min_rowsets, "10");
 
+CONF_mBool(enable_function_pushdown, "true");

Review Comment:
   ok done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] compasses commented on a diff in pull request #10355: Improve performance like/not like filter through pushdown function to storage engine

Posted by GitBox <gi...@apache.org>.
compasses commented on code in PR #10355:
URL: https://github.com/apache/doris/pull/10355#discussion_r904986902


##########
be/src/vec/exec/volap_scan_node.cpp:
##########
@@ -772,17 +838,22 @@ Status VOlapScanNode::start_scan(RuntimeState* state) {
     }
 
     VLOG_CRITICAL << "BuildOlapFilters";
-    // 3. Using ColumnValueRange to Build StorageEngine filters
+    // 3.1 Using ColumnValueRange to Build StorageEngine filters
     RETURN_IF_ERROR(build_olap_filters());
+    // 3.2 Function pushdown
+    if (config::enable_function_pushdown)
+        RETURN_IF_ERROR(build_function_filters());
 
     VLOG_CRITICAL << "BuildScanKey";
     // 4. Using `Key Column`'s ColumnValueRange to split ScanRange to several `Sub ScanRange`
     RETURN_IF_ERROR(build_scan_key());
 
     VLOG_CRITICAL << "Filter idle conjuncts";
-    // 5. Filter idle conjunct which already trans to olap filters
+    // 5.1 Filter idle conjunct which already trans to olap filters
     // this must be after build_scan_key, it will free the StringValue memory
     remove_pushed_conjuncts(state);
+    // 5.2 move the pushed function context
+    move_pushed_func_conjuncts(state);

Review Comment:
   cannot reuse the ```remove_pushed_conjuncts``` here, because we just pushdown the function execution into storage, but the function context and some memory referenced by the pushed function still hold in the ```_pushed_func_conjunct_ctxs``` and will be closed with scan node close.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org