You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/10/12 05:35:59 UTC

[GitHub] [doris] Gabriel39 opened a new pull request, #13314: [Improvement](like) Change `like` function to batch call

Gabriel39 opened a new pull request, #13314:
URL: https://github.com/apache/doris/pull/13314

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #13314: [Improvement](like) Change `like` function to batch call

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13314:
URL: https://github.com/apache/doris/pull/13314#issuecomment-1279891290

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #13314: [Improvement](like) Change `like` function to batch call

Posted by GitBox <gi...@apache.org>.
hello-stephen commented on PR #13314:
URL: https://github.com/apache/doris/pull/13314#issuecomment-1279751163

   https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20221015215615_clickbench_pr_29762.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Gabriel39 merged pull request #13314: [Improvement](like) Change `like` function to batch call

Posted by GitBox <gi...@apache.org>.
Gabriel39 merged PR #13314:
URL: https://github.com/apache/doris/pull/13314


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #13314: [Improvement](like) Change `like` function to batch call

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13314:
URL: https://github.com/apache/doris/pull/13314#issuecomment-1279891295

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #13314: [Improvement](like) Change `like` function to batch call

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #13314:
URL: https://github.com/apache/doris/pull/13314#discussion_r995585935


##########
be/src/vec/functions/like.cpp:
##########
@@ -63,35 +63,171 @@ Status LikeSearchState::clone(LikeSearchState& cloned) {
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_starts_with_fn(LikeSearchState* state, const StringValue& val,
+Status FunctionLikeBase::constant_starts_with_fn(LikeSearchState* state, const ColumnString& val,
                                                  const StringValue& pattern,
-                                                 unsigned char* result) {
-    *result = (val.len >= state->search_string_sv.len) &&
-              (state->search_string_sv == val.substring(0, state->search_string_sv.len));
+                                                 ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        const auto& str_ref = val.get_data_at(i);
+        result[i] = (str_ref.size >= state->search_string_sv.size) &&
+                    str_ref.start_with(state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_ends_with_fn(LikeSearchState* state, const ColumnString& val,
+                                               const StringValue& pattern,
+                                               ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        const auto& str_ref = val.get_data_at(i);
+        result[i] = (str_ref.size >= state->search_string_sv.size) &&
+                    str_ref.end_with(state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_equals_fn(LikeSearchState* state, const ColumnString& val,
+                                            const StringValue& pattern,
+                                            ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        result[i] = (val.get_data_at(i) == state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_substring_fn(LikeSearchState* state, const ColumnString& val,
+                                               const StringValue& pattern,
+                                               ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        if (state->search_string_sv.size == 0) {
+            result[i] = true;

Review Comment:
   why here only set one result element and return ? seems wrong?



##########
be/src/vec/functions/like.cpp:
##########
@@ -63,35 +63,171 @@ Status LikeSearchState::clone(LikeSearchState& cloned) {
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_starts_with_fn(LikeSearchState* state, const StringValue& val,
+Status FunctionLikeBase::constant_starts_with_fn(LikeSearchState* state, const ColumnString& val,
                                                  const StringValue& pattern,
-                                                 unsigned char* result) {
-    *result = (val.len >= state->search_string_sv.len) &&
-              (state->search_string_sv == val.substring(0, state->search_string_sv.len));
+                                                 ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        const auto& str_ref = val.get_data_at(i);
+        result[i] = (str_ref.size >= state->search_string_sv.size) &&
+                    str_ref.start_with(state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_ends_with_fn(LikeSearchState* state, const ColumnString& val,
+                                               const StringValue& pattern,
+                                               ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        const auto& str_ref = val.get_data_at(i);
+        result[i] = (str_ref.size >= state->search_string_sv.size) &&
+                    str_ref.end_with(state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_equals_fn(LikeSearchState* state, const ColumnString& val,
+                                            const StringValue& pattern,
+                                            ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        result[i] = (val.get_data_at(i) == state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_substring_fn(LikeSearchState* state, const ColumnString& val,
+                                               const StringValue& pattern,
+                                               ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        if (state->search_string_sv.size == 0) {
+            result[i] = true;
+            return Status::OK();
+        }
+        result[i] = state->substring_pattern.search(val.get_data_at(i)) != -1;
+    }
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_ends_with_fn(LikeSearchState* state, const StringValue& val,
-                                               const StringValue& pattern, unsigned char* result) {
-    *result = (val.len >= state->search_string_sv.len) &&
-              (state->search_string_sv ==
-               val.substring(val.len - state->search_string_sv.len, state->search_string_sv.len));
+Status FunctionLikeBase::constant_starts_with_fn_predicate(
+        LikeSearchState* state, const PredicateColumnType<TYPE_STRING>& val,
+        const StringValue& pattern, ColumnUInt8::Container& result, uint16_t* sel, size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        result[i] = (data_ptr[sel[i]].size >= state->search_string_sv.size) &&
+                    (state->search_string_sv ==
+                     data_ptr[sel[i]].substring(0, state->search_string_sv.size));
+    }
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_equals_fn(LikeSearchState* state, const StringValue& val,
-                                            const StringValue& pattern, unsigned char* result) {
+Status FunctionLikeBase::constant_ends_with_fn_predicate(
+        LikeSearchState* state, const PredicateColumnType<TYPE_STRING>& val,
+        const StringValue& pattern, ColumnUInt8::Container& result, uint16_t* sel, size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        result[i] =
+                (data_ptr[sel[i]].size >= state->search_string_sv.size) &&
+                (state->search_string_sv ==
+                 data_ptr[sel[i]].substring(data_ptr[sel[i]].size - state->search_string_sv.size,
+                                            state->search_string_sv.size));
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_equals_fn_predicate(LikeSearchState* state,
+                                                      const PredicateColumnType<TYPE_STRING>& val,
+                                                      const StringValue& pattern,
+                                                      ColumnUInt8::Container& result, uint16_t* sel,
+                                                      size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        result[i] = (data_ptr[sel[i]] == state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_substring_fn_predicate(
+        LikeSearchState* state, const PredicateColumnType<TYPE_STRING>& val,
+        const StringValue& pattern, ColumnUInt8::Container& result, uint16_t* sel, size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        if (state->search_string_sv.size == 0) {
+            result[i] = true;

Review Comment:
   same as upper



##########
be/src/vec/functions/like.h:
##########
@@ -166,11 +169,64 @@ class FunctionLikeBase : public IFunction {
                                                  const StringValue* values, uint16_t size,
                                                  unsigned char* result);
 
-    static Status constant_regex_fn(LikeSearchState* state, const StringValue& val,
-                                    const StringValue& pattern, unsigned char* result);
+    static Status constant_regex_fn(LikeSearchState* state, const ColumnString& val,
+                                    const StringValue& pattern, ColumnUInt8::Container& result);
+
+    static Status regexp_fn(LikeSearchState* state, const ColumnString& val,
+                            const StringValue& pattern, ColumnUInt8::Container& result);
+

Review Comment:
   Add a comment the code with name `fn_predicate` only execute in storage engine? or a better name ?
   



##########
be/src/vec/functions/like.cpp:
##########
@@ -63,35 +63,171 @@ Status LikeSearchState::clone(LikeSearchState& cloned) {
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_starts_with_fn(LikeSearchState* state, const StringValue& val,
+Status FunctionLikeBase::constant_starts_with_fn(LikeSearchState* state, const ColumnString& val,
                                                  const StringValue& pattern,
-                                                 unsigned char* result) {
-    *result = (val.len >= state->search_string_sv.len) &&
-              (state->search_string_sv == val.substring(0, state->search_string_sv.len));
+                                                 ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        const auto& str_ref = val.get_data_at(i);
+        result[i] = (str_ref.size >= state->search_string_sv.size) &&
+                    str_ref.start_with(state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_ends_with_fn(LikeSearchState* state, const ColumnString& val,
+                                               const StringValue& pattern,
+                                               ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        const auto& str_ref = val.get_data_at(i);
+        result[i] = (str_ref.size >= state->search_string_sv.size) &&
+                    str_ref.end_with(state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_equals_fn(LikeSearchState* state, const ColumnString& val,
+                                            const StringValue& pattern,
+                                            ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        result[i] = (val.get_data_at(i) == state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_substring_fn(LikeSearchState* state, const ColumnString& val,
+                                               const StringValue& pattern,
+                                               ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        if (state->search_string_sv.size == 0) {
+            result[i] = true;
+            return Status::OK();
+        }
+        result[i] = state->substring_pattern.search(val.get_data_at(i)) != -1;
+    }
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_ends_with_fn(LikeSearchState* state, const StringValue& val,
-                                               const StringValue& pattern, unsigned char* result) {
-    *result = (val.len >= state->search_string_sv.len) &&
-              (state->search_string_sv ==
-               val.substring(val.len - state->search_string_sv.len, state->search_string_sv.len));
+Status FunctionLikeBase::constant_starts_with_fn_predicate(
+        LikeSearchState* state, const PredicateColumnType<TYPE_STRING>& val,
+        const StringValue& pattern, ColumnUInt8::Container& result, uint16_t* sel, size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        result[i] = (data_ptr[sel[i]].size >= state->search_string_sv.size) &&
+                    (state->search_string_sv ==
+                     data_ptr[sel[i]].substring(0, state->search_string_sv.size));
+    }
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_equals_fn(LikeSearchState* state, const StringValue& val,
-                                            const StringValue& pattern, unsigned char* result) {
+Status FunctionLikeBase::constant_ends_with_fn_predicate(
+        LikeSearchState* state, const PredicateColumnType<TYPE_STRING>& val,
+        const StringValue& pattern, ColumnUInt8::Container& result, uint16_t* sel, size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        result[i] =
+                (data_ptr[sel[i]].size >= state->search_string_sv.size) &&
+                (state->search_string_sv ==
+                 data_ptr[sel[i]].substring(data_ptr[sel[i]].size - state->search_string_sv.size,
+                                            state->search_string_sv.size));
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_equals_fn_predicate(LikeSearchState* state,
+                                                      const PredicateColumnType<TYPE_STRING>& val,
+                                                      const StringValue& pattern,
+                                                      ColumnUInt8::Container& result, uint16_t* sel,
+                                                      size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        result[i] = (data_ptr[sel[i]] == state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_substring_fn_predicate(
+        LikeSearchState* state, const PredicateColumnType<TYPE_STRING>& val,
+        const StringValue& pattern, ColumnUInt8::Container& result, uint16_t* sel, size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        if (state->search_string_sv.size == 0) {
+            result[i] = true;

Review Comment:
   same as upper



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #13314: [Improvement](like) Change `like` function to batch call

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #13314:
URL: https://github.com/apache/doris/pull/13314#discussion_r995598114


##########
be/src/vec/functions/like.cpp:
##########
@@ -63,35 +63,171 @@ Status LikeSearchState::clone(LikeSearchState& cloned) {
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_starts_with_fn(LikeSearchState* state, const StringValue& val,
+Status FunctionLikeBase::constant_starts_with_fn(LikeSearchState* state, const ColumnString& val,
                                                  const StringValue& pattern,
-                                                 unsigned char* result) {
-    *result = (val.len >= state->search_string_sv.len) &&
-              (state->search_string_sv == val.substring(0, state->search_string_sv.len));
+                                                 ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        const auto& str_ref = val.get_data_at(i);
+        result[i] = (str_ref.size >= state->search_string_sv.size) &&
+                    str_ref.start_with(state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_ends_with_fn(LikeSearchState* state, const ColumnString& val,
+                                               const StringValue& pattern,
+                                               ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        const auto& str_ref = val.get_data_at(i);
+        result[i] = (str_ref.size >= state->search_string_sv.size) &&
+                    str_ref.end_with(state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_equals_fn(LikeSearchState* state, const ColumnString& val,
+                                            const StringValue& pattern,
+                                            ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        result[i] = (val.get_data_at(i) == state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_substring_fn(LikeSearchState* state, const ColumnString& val,
+                                               const StringValue& pattern,
+                                               ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        if (state->search_string_sv.size == 0) {
+            result[i] = true;
+            return Status::OK();
+        }
+        result[i] = state->substring_pattern.search(val.get_data_at(i)) != -1;
+    }
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_ends_with_fn(LikeSearchState* state, const StringValue& val,
-                                               const StringValue& pattern, unsigned char* result) {
-    *result = (val.len >= state->search_string_sv.len) &&
-              (state->search_string_sv ==
-               val.substring(val.len - state->search_string_sv.len, state->search_string_sv.len));
+Status FunctionLikeBase::constant_starts_with_fn_predicate(
+        LikeSearchState* state, const PredicateColumnType<TYPE_STRING>& val,
+        const StringValue& pattern, ColumnUInt8::Container& result, uint16_t* sel, size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        result[i] = (data_ptr[sel[i]].size >= state->search_string_sv.size) &&
+                    (state->search_string_sv ==
+                     data_ptr[sel[i]].substring(0, state->search_string_sv.size));
+    }
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_equals_fn(LikeSearchState* state, const StringValue& val,
-                                            const StringValue& pattern, unsigned char* result) {
+Status FunctionLikeBase::constant_ends_with_fn_predicate(
+        LikeSearchState* state, const PredicateColumnType<TYPE_STRING>& val,
+        const StringValue& pattern, ColumnUInt8::Container& result, uint16_t* sel, size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        result[i] =
+                (data_ptr[sel[i]].size >= state->search_string_sv.size) &&
+                (state->search_string_sv ==
+                 data_ptr[sel[i]].substring(data_ptr[sel[i]].size - state->search_string_sv.size,
+                                            state->search_string_sv.size));
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_equals_fn_predicate(LikeSearchState* state,
+                                                      const PredicateColumnType<TYPE_STRING>& val,
+                                                      const StringValue& pattern,
+                                                      ColumnUInt8::Container& result, uint16_t* sel,
+                                                      size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        result[i] = (data_ptr[sel[i]] == state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_substring_fn_predicate(
+        LikeSearchState* state, const PredicateColumnType<TYPE_STRING>& val,
+        const StringValue& pattern, ColumnUInt8::Container& result, uint16_t* sel, size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        if (state->search_string_sv.size == 0) {
+            result[i] = true;

Review Comment:
   same as upper



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #13314: [Improvement](like) Change `like` function to batch call

Posted by GitBox <gi...@apache.org>.
HappenLee commented on code in PR #13314:
URL: https://github.com/apache/doris/pull/13314#discussion_r995661021


##########
be/src/vec/functions/like.cpp:
##########
@@ -120,29 +256,83 @@ Status FunctionLikeBase::constant_substring_fn_vec_dict(LikeSearchState* state,
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_regex_fn(LikeSearchState* state, const StringValue& val,
-                                           const StringValue& pattern, unsigned char* result) {
-    auto ret = hs_scan(state->hs_database.get(), val.ptr, val.len, 0, state->hs_scratch.get(),
-                       state->hs_match_handler, (void*)result);
-    if (ret != HS_SUCCESS && ret != HS_SCAN_TERMINATED) {
-        return Status::RuntimeError(fmt::format("hyperscan error: {}", ret));
+Status FunctionLikeBase::constant_regex_fn(LikeSearchState* state, const ColumnString& val,
+                                           const StringValue& pattern,
+                                           ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        const auto& str_ref = val.get_data_at(i);
+        auto ret = hs_scan(state->hs_database.get(), str_ref.data, str_ref.size, 0,
+                           state->hs_scratch.get(), state->hs_match_handler,
+                           (void*)(result.data() + i));
+        if (ret != HS_SUCCESS && ret != HS_SCAN_TERMINATED) {
+            return Status::RuntimeError(fmt::format("hyperscan error: {}", ret));
+        }
     }
 
     return Status::OK();
 }
 
-Status FunctionLikeBase::regexp_fn(LikeSearchState* state, const StringValue& val,
-                                   const StringValue& pattern, unsigned char* result) {
+Status FunctionLikeBase::regexp_fn(LikeSearchState* state, const ColumnString& val,
+                                   const StringValue& pattern, ColumnUInt8::Container& result) {
     std::string re_pattern(pattern.ptr, pattern.len);

Review Comment:
   string_view



##########
be/src/vec/functions/like.cpp:
##########
@@ -63,35 +63,171 @@ Status LikeSearchState::clone(LikeSearchState& cloned) {
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_starts_with_fn(LikeSearchState* state, const StringValue& val,
+Status FunctionLikeBase::constant_starts_with_fn(LikeSearchState* state, const ColumnString& val,
                                                  const StringValue& pattern,
-                                                 unsigned char* result) {
-    *result = (val.len >= state->search_string_sv.len) &&
-              (state->search_string_sv == val.substring(0, state->search_string_sv.len));
+                                                 ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        const auto& str_ref = val.get_data_at(i);
+        result[i] = (str_ref.size >= state->search_string_sv.size) &&
+                    str_ref.start_with(state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_ends_with_fn(LikeSearchState* state, const ColumnString& val,
+                                               const StringValue& pattern,
+                                               ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        const auto& str_ref = val.get_data_at(i);
+        result[i] = (str_ref.size >= state->search_string_sv.size) &&
+                    str_ref.end_with(state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_equals_fn(LikeSearchState* state, const ColumnString& val,
+                                            const StringValue& pattern,
+                                            ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        result[i] = (val.get_data_at(i) == state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_substring_fn(LikeSearchState* state, const ColumnString& val,
+                                               const StringValue& pattern,
+                                               ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        if (state->search_string_sv.size == 0) {
+            result[i] = true;
+            return Status::OK();
+        }
+        result[i] = state->substring_pattern.search(val.get_data_at(i)) != -1;
+    }
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_ends_with_fn(LikeSearchState* state, const StringValue& val,
-                                               const StringValue& pattern, unsigned char* result) {
-    *result = (val.len >= state->search_string_sv.len) &&
-              (state->search_string_sv ==
-               val.substring(val.len - state->search_string_sv.len, state->search_string_sv.len));
+Status FunctionLikeBase::constant_starts_with_fn_predicate(
+        LikeSearchState* state, const PredicateColumnType<TYPE_STRING>& val,
+        const StringValue& pattern, ColumnUInt8::Container& result, uint16_t* sel, size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        result[i] = (data_ptr[sel[i]].size >= state->search_string_sv.size) &&
+                    (state->search_string_sv ==
+                     data_ptr[sel[i]].substring(0, state->search_string_sv.size));
+    }
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_equals_fn(LikeSearchState* state, const StringValue& val,
-                                            const StringValue& pattern, unsigned char* result) {
+Status FunctionLikeBase::constant_ends_with_fn_predicate(
+        LikeSearchState* state, const PredicateColumnType<TYPE_STRING>& val,
+        const StringValue& pattern, ColumnUInt8::Container& result, uint16_t* sel, size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        result[i] =
+                (data_ptr[sel[i]].size >= state->search_string_sv.size) &&
+                (state->search_string_sv ==
+                 data_ptr[sel[i]].substring(data_ptr[sel[i]].size - state->search_string_sv.size,
+                                            state->search_string_sv.size));
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_equals_fn_predicate(LikeSearchState* state,
+                                                      const PredicateColumnType<TYPE_STRING>& val,
+                                                      const StringValue& pattern,
+                                                      ColumnUInt8::Container& result, uint16_t* sel,
+                                                      size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        result[i] = (data_ptr[sel[i]] == state->search_string_sv);
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_substring_fn_predicate(
+        LikeSearchState* state, const PredicateColumnType<TYPE_STRING>& val,
+        const StringValue& pattern, ColumnUInt8::Container& result, uint16_t* sel, size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        if (state->search_string_sv.size == 0) {
+            result[i] = true;
+            return Status::OK();
+        }
+        result[i] = state->substring_pattern.search(data_ptr[sel[i]]) != -1;
+    }
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_starts_with_fn_scalar(LikeSearchState* state,
+                                                        const StringRef& val,
+                                                        const StringValue& pattern,
+                                                        unsigned char* result) {
+    *result = (val.size >= state->search_string_sv.size) &&
+              (state->search_string_sv == val.substring(0, state->search_string_sv.size));
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_ends_with_fn_scalar(LikeSearchState* state, const StringRef& val,
+                                                      const StringValue& pattern,
+                                                      unsigned char* result) {
+    *result = (val.size >= state->search_string_sv.size) &&
+              (state->search_string_sv == val.substring(val.size - state->search_string_sv.size,
+                                                        state->search_string_sv.size));
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_equals_fn_scalar(LikeSearchState* state, const StringRef& val,
+                                                   const StringValue& pattern,
+                                                   unsigned char* result) {
     *result = (val == state->search_string_sv);
     return Status::OK();
 }
 
-Status FunctionLikeBase::constant_substring_fn(LikeSearchState* state, const StringValue& val,
-                                               const StringValue& pattern, unsigned char* result) {
-    if (state->search_string_sv.len == 0) {
+Status FunctionLikeBase::constant_substring_fn_scalar(LikeSearchState* state, const StringRef& val,
+                                                      const StringValue& pattern,
+                                                      unsigned char* result) {
+    if (state->search_string_sv.size == 0) {
         *result = true;
         return Status::OK();
     }
-    *result = state->substring_pattern.search(&val) != -1;
+    *result = state->substring_pattern.search(val) != -1;
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_regex_fn_scalar(LikeSearchState* state, const StringRef& val,
+                                                  const StringValue& pattern,
+                                                  unsigned char* result) {
+    auto ret = hs_scan(state->hs_database.get(), val.data, val.size, 0, state->hs_scratch.get(),
+                       state->hs_match_handler, (void*)result);
+    if (ret != HS_SUCCESS && ret != HS_SCAN_TERMINATED) {
+        return Status::RuntimeError(fmt::format("hyperscan error: {}", ret));
+    }
+
+    return Status::OK();
+}
+
+Status FunctionLikeBase::regexp_fn_scalar(LikeSearchState* state, const StringRef& val,
+                                          const StringValue& pattern, unsigned char* result) {
+    std::string re_pattern(pattern.ptr, pattern.len);

Review Comment:
   maybe better use `string_view`



##########
be/src/vec/functions/like.cpp:
##########
@@ -196,22 +386,28 @@ Status FunctionLikeBase::execute_impl(FunctionContext* context, Block& block,
             context->get_function_state(FunctionContext::THREAD_LOCAL));
     // for constant_substring_fn, use long run length search for performance
     if (constant_substring_fn ==
-        *(state->function.target<doris::Status (*)(LikeSearchState * state, const StringValue&,
-                                                   const StringValue&, unsigned char*)>())) {
+        *(state->function
+                  .target<doris::Status (*)(LikeSearchState * state, const ColumnString&,
+                                            const StringValue&, ColumnUInt8::Container&)>())) {
         RETURN_IF_ERROR(execute_substring(values->get_chars(), values->get_offsets(), vec_res,
-                                          state->function, &state->search_state));
+                                          &state->search_state));
     } else {
         const auto pattern_col = block.get_by_position(arguments[1]).column;
 
-        if (const auto* patterns = check_and_get_column<ColumnString>(pattern_col.get())) {
-            RETURN_IF_ERROR(vector_vector(values->get_chars(), values->get_offsets(),
-                                          patterns->get_chars(), patterns->get_offsets(), vec_res,
-                                          state->function, &state->search_state));
+        if (const auto* str_patterns = check_and_get_column<ColumnString>(pattern_col.get())) {
+            if (str_patterns->size() != 1) {

Review Comment:
   seems only DCHECK here better?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org