You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "xiaokang (via GitHub)" <gi...@apache.org> on 2023/04/03 10:37:15 UTC

[GitHub] [doris] xiaokang opened a new pull request, #18350: [Enhencement](like) fallback to re2 if hyperscan failed

xiaokang opened a new pull request, #18350:
URL: https://github.com/apache/doris/pull/18350

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [ ] Has unit tests been added
   * [ ] Has document been added or modified
   * [ ] Does it need to update dependencies
   * [ ] Is this PR support rollback (If NO, please explain WHY)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] jacktengg commented on pull request #18350: [Enhencement](like) fallback to re2 if hyperscan failed

Posted by "jacktengg (via GitHub)" <gi...@apache.org>.
jacktengg commented on PR #18350:
URL: https://github.com/apache/doris/pull/18350#issuecomment-1500168033

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #18350: [Enhencement](like) fallback to re2 if hyperscan failed

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on code in PR #18350:
URL: https://github.com/apache/doris/pull/18350#discussion_r1155790678


##########
be/src/vec/functions/like.cpp:
##########
@@ -196,6 +196,39 @@ Status FunctionLikeBase::constant_substring_fn_scalar(LikeSearchState* state, co
     return Status::OK();
 }
 
+Status FunctionLikeBase::constant_re2_regex_fn(LikeSearchState* state, const ColumnString& val,
+                                           const StringRef& pattern,
+                                           ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        const auto& str_ref = val.get_data_at(i);
+        *(result.data() + i) = RE2::FullMatch(
+            re2::StringPiece(str_ref.data, str_ref.size), *state->regex.get());

Review Comment:
   warning: redundant get() call on smart pointer [readability-redundant-smartptr-get]
   
   ```suggestion
               re2::StringPiece(str_ref.data, str_ref.size), *state->regex);
   ```
   



##########
be/src/vec/functions/like.cpp:
##########
@@ -196,6 +196,39 @@
     return Status::OK();
 }
 
+Status FunctionLikeBase::constant_re2_regex_fn(LikeSearchState* state, const ColumnString& val,
+                                           const StringRef& pattern,
+                                           ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        const auto& str_ref = val.get_data_at(i);
+        *(result.data() + i) = RE2::FullMatch(
+            re2::StringPiece(str_ref.data, str_ref.size), *state->regex.get());
+    }
+
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_re2_regex_fn_predicate(LikeSearchState* state,
+                                                     const PredicateColumnType<TYPE_STRING>& val,
+                                                     const StringRef& pattern,
+                                                     ColumnUInt8::Container& result,
+                                                     const uint16_t* sel, size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        *(result.data() + i) = RE2::FullMatch(
+            re2::StringPiece(data_ptr[sel[i]].data, data_ptr[sel[i]].size), *state->regex.get());

Review Comment:
   warning: redundant get() call on smart pointer [readability-redundant-smartptr-get]
   
   ```suggestion
               re2::StringPiece(data_ptr[sel[i]].data, data_ptr[sel[i]].size), *state->regex);
   ```
   



##########
be/src/vec/functions/like.cpp:
##########
@@ -196,6 +196,39 @@
     return Status::OK();
 }
 
+Status FunctionLikeBase::constant_re2_regex_fn(LikeSearchState* state, const ColumnString& val,
+                                           const StringRef& pattern,
+                                           ColumnUInt8::Container& result) {
+    auto sz = val.size();
+    for (size_t i = 0; i < sz; i++) {
+        const auto& str_ref = val.get_data_at(i);
+        *(result.data() + i) = RE2::FullMatch(
+            re2::StringPiece(str_ref.data, str_ref.size), *state->regex.get());
+    }
+
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_re2_regex_fn_predicate(LikeSearchState* state,
+                                                     const PredicateColumnType<TYPE_STRING>& val,
+                                                     const StringRef& pattern,
+                                                     ColumnUInt8::Container& result,
+                                                     const uint16_t* sel, size_t sz) {
+    auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
+    for (size_t i = 0; i < sz; i++) {
+        *(result.data() + i) = RE2::FullMatch(
+            re2::StringPiece(data_ptr[sel[i]].data, data_ptr[sel[i]].size), *state->regex.get());
+    }
+
+    return Status::OK();
+}
+
+Status FunctionLikeBase::constant_re2_regex_fn_scalar(LikeSearchState* state, const StringRef& val,
+                                            const StringRef& pattern, unsigned char* result) {
+    *result = RE2::FullMatch(re2::StringPiece(val.data, val.size), *state->regex.get());

Review Comment:
   warning: redundant get() call on smart pointer [readability-redundant-smartptr-get]
   
   ```suggestion
       *result = RE2::FullMatch(re2::StringPiece(val.data, val.size), *state->regex);
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman merged pull request #18350: [Enhencement](like) fallback to re2 if hyperscan failed

Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman merged PR #18350:
URL: https://github.com/apache/doris/pull/18350


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #18350: [Enhencement](like) fallback to re2 if hyperscan failed

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on code in PR #18350:
URL: https://github.com/apache/doris/pull/18350#discussion_r1160447014


##########
be/src/vec/functions/like.cpp:
##########
@@ -198,10 +208,14 @@ Status FunctionLikeBase::constant_substring_fn_scalar(LikeSearchState* state, co
 
 Status FunctionLikeBase::constant_regex_fn_scalar(LikeSearchState* state, const StringRef& val,
                                                   const StringRef& pattern, unsigned char* result) {
-    auto ret = hs_scan(state->hs_database.get(), val.data, val.size, 0, state->hs_scratch.get(),
-                       doris::vectorized::LikeSearchState::hs_match_handler, (void*)result);
-    if (ret != HS_SUCCESS && ret != HS_SCAN_TERMINATED) {
-        return Status::RuntimeError(fmt::format("hyperscan error: {}", ret));
+    if (state->hs_database) { // use hyperscan
+        auto ret = hs_scan(state->hs_database.get(), val.data, val.size, 0, state->hs_scratch.get(),
+                           doris::vectorized::LikeSearchState::hs_match_handler, (void*)result);
+        if (ret != HS_SUCCESS && ret != HS_SCAN_TERMINATED) {
+            return Status::RuntimeError(fmt::format("hyperscan error: {}", ret));
+        }
+    } else { // fallback to re2
+        *result = RE2::PartialMatch(re2::StringPiece(val.data, val.size), *state->regex.get());

Review Comment:
   warning: redundant get() call on smart pointer [readability-redundant-smartptr-get]
   
   ```suggestion
           *result = RE2::PartialMatch(re2::StringPiece(val.data, val.size), *state->regex);
   ```
   



##########
be/src/vec/functions/like.cpp:
##########
@@ -213,31 +227,50 @@
 
     hs_database_t* database = nullptr;
     hs_scratch_t* scratch = nullptr;
-    RETURN_IF_ERROR(hs_prepare(nullptr, re_pattern.c_str(), &database, &scratch));
+    if (hs_prepare(nullptr, re_pattern.c_str(), &database, &scratch).ok()) { // use hyperscan
+        auto ret = hs_scan(database, val.data, val.size, 0, scratch,
+                           doris::vectorized::LikeSearchState::hs_match_handler, (void*)result);
+        if (ret != HS_SUCCESS && ret != HS_SCAN_TERMINATED) {
+            return Status::RuntimeError(fmt::format("hyperscan error: {}", ret));
+        }
 
-    auto ret = hs_scan(database, val.data, val.size, 0, scratch,
-                       doris::vectorized::LikeSearchState::hs_match_handler, (void*)result);
-    if (ret != HS_SUCCESS && ret != HS_SCAN_TERMINATED) {
-        return Status::RuntimeError(fmt::format("hyperscan error: {}", ret));
+        hs_free_scratch(scratch);
+        hs_free_database(database);
+    } else { // fallback to re2
+        RE2::Options opts;
+        opts.set_never_nl(false);
+        opts.set_dot_nl(true);
+        re2::RE2 re(re_pattern, opts);
+        if (re.ok()) {
+            *result = RE2::PartialMatch(re2::StringPiece(val.data, val.size), re);
+        } else {
+            return Status::RuntimeError("Invalid pattern: {}", pattern.debug_string());
+        }
     }
 
-    hs_free_scratch(scratch);
-    hs_free_database(database);
-
     return Status::OK();
 }
 
 Status FunctionLikeBase::constant_regex_fn(LikeSearchState* state, const ColumnString& val,
                                            const StringRef& pattern,
                                            ColumnUInt8::Container& result) {
     auto sz = val.size();
-    for (size_t i = 0; i < sz; i++) {
-        const auto& str_ref = val.get_data_at(i);
-        auto ret = hs_scan(
-                state->hs_database.get(), str_ref.data, str_ref.size, 0, state->hs_scratch.get(),
-                doris::vectorized::LikeSearchState::hs_match_handler, (void*)(result.data() + i));
-        if (ret != HS_SUCCESS && ret != HS_SCAN_TERMINATED) {
-            return Status::RuntimeError(fmt::format("hyperscan error: {}", ret));
+    if (state->hs_database) { // use hyperscan
+        for (size_t i = 0; i < sz; i++) {
+            const auto& str_ref = val.get_data_at(i);
+            auto ret = hs_scan(state->hs_database.get(), str_ref.data, str_ref.size, 0,
+                               state->hs_scratch.get(),
+                               doris::vectorized::LikeSearchState::hs_match_handler,
+                               (void*)(result.data() + i));
+            if (ret != HS_SUCCESS && ret != HS_SCAN_TERMINATED) {
+                return Status::RuntimeError(fmt::format("hyperscan error: {}", ret));
+            }
+        }
+    } else { // fallback to re2
+        for (size_t i = 0; i < sz; i++) {
+            const auto& str_ref = val.get_data_at(i);
+            *(result.data() + i) = RE2::PartialMatch(re2::StringPiece(str_ref.data, str_ref.size),
+                                                     *state->regex.get());

Review Comment:
   warning: redundant get() call on smart pointer [readability-redundant-smartptr-get]
   
   ```suggestion
                                                        *state->regex);
   ```
   



##########
be/src/vec/functions/like.cpp:
##########
@@ -275,13 +323,22 @@
                                                      ColumnUInt8::Container& result,
                                                      const uint16_t* sel, size_t sz) {
     auto data_ptr = reinterpret_cast<const StringRef*>(val.get_data().data());
-    for (size_t i = 0; i < sz; i++) {
-        auto ret = hs_scan(state->hs_database.get(), data_ptr[sel[i]].data, data_ptr[sel[i]].size,
-                           0, state->hs_scratch.get(),
-                           doris::vectorized::LikeSearchState::hs_match_handler,
-                           (void*)(result.data() + i));
-        if (ret != HS_SUCCESS && ret != HS_SCAN_TERMINATED) {
-            return Status::RuntimeError(fmt::format("hyperscan error: {}", ret));
+
+    if (state->hs_database) { // use hyperscan
+        for (size_t i = 0; i < sz; i++) {
+            auto ret = hs_scan(state->hs_database.get(), data_ptr[sel[i]].data,
+                               data_ptr[sel[i]].size, 0, state->hs_scratch.get(),
+                               doris::vectorized::LikeSearchState::hs_match_handler,
+                               (void*)(result.data() + i));
+            if (ret != HS_SUCCESS && ret != HS_SCAN_TERMINATED) {
+                return Status::RuntimeError(fmt::format("hyperscan error: {}", ret));
+            }
+        }
+    } else { // fallback to re2
+        for (size_t i = 0; i < sz; i++) {
+            *(result.data() + i) = RE2::PartialMatch(
+                    re2::StringPiece(data_ptr[sel[i]].data, data_ptr[sel[i]].size),
+                    *state->regex.get());

Review Comment:
   warning: redundant get() call on smart pointer [readability-redundant-smartptr-get]
   
   ```suggestion
                       *state->regex);
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] xiaokang commented on pull request #18350: [Enhencement](like) fallback to re2 if hyperscan failed

Posted by "xiaokang (via GitHub)" <gi...@apache.org>.
xiaokang commented on PR #18350:
URL: https://github.com/apache/doris/pull/18350#issuecomment-1499952212

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org