You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "zclllyybb (via GitHub)" <gi...@apache.org> on 2023/04/14 10:04:54 UTC

[GitHub] [doris] zclllyybb opened a new pull request, #18682: [BugFix](functions) fix multi_search_all_positions

zclllyybb opened a new pull request, #18682:
URL: https://github.com/apache/doris/pull/18682

   # Proposed changes
   
   before:
   ```
   +---------------+----------------------------------+-----------------------------------------------+
   | content       | mode                             | multi_search_all_positions(`content`, `mode`) |
   +---------------+----------------------------------+-----------------------------------------------+
   | Hello, World! | ['hello', 'world']               | [0, 0]                                        |
   | Hello, World! | ['hello', 'world', 'Hello', '!'] | [0, 0, 0, 0]                                  |
   | hello, world! | ['Hello', 'world']               | [0, 0]                                        |
   | hello, world! | ['hello', 'world', 'Hello', '!'] | [0, 0, 0, 0]                                  |
   | HHHHW!        | ['H', 'HHHH', 'HW', 'WH']        | [0, 0, 0, 0]                                  |
   +---------------+----------------------------------+-----------------------------------------------+
   ```
   
   after:
   ```
   +---------------+----------------------------------+-----------------------------------------------+
   | content       | mode                             | multi_search_all_positions(`content`, `mode`) |
   +---------------+----------------------------------+-----------------------------------------------+
   | Hello, World! | ['hello', 'world']               | [0, 0]                                        |
   | Hello, World! | ['hello', 'world', 'Hello', '!'] | [0, 0, 1, 13]                                 |
   | hello, world! | ['Hello', 'world']               | [0, 8]                                        |
   | hello, world! | ['hello', 'world', 'Hello', '!'] | [1, 8, 0, 13]                                 |
   | HHHHW!        | ['H', 'HHHH', 'HW', 'WH']        | [1, 1, 4, 0]                                  |
   +---------------+----------------------------------+-----------------------------------------------+
   ```
   
   ## Problem summary
   
   Completely rewrite the function. now it search for every needle's **first** occur in haystack.
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [x] Has unit tests been added
   * [x] Has document been added or modified
   * [ ] Does it need to update dependencies
   * [x] Is this PR support rollback (If NO, please explain WHY)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18682: [BugFix](functions) fix multi_search_all_positions

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18682:
URL: https://github.com/apache/doris/pull/18682#issuecomment-1508273022

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #18682: [BugFix](functions) fix multi_search_all_positions

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on code in PR #18682:
URL: https://github.com/apache/doris/pull/18682#discussion_r1166655674


##########
be/src/vec/functions/functions_multi_string_position.cpp:
##########
@@ -103,8 +105,19 @@ class FunctionMultiStringPosition : public IFunction {
     }
 };
 
-template <typename Impl>
 struct FunctionMultiSearchAllPositionsImpl {
+private:
+    using SingleSearcher = ASCIICaseSensitiveStringSearcher;
+    static SingleSearcher create_multi_searcher(const StringRef& needle) {

Review Comment:
   only use in `FunctionMultiSearchAllPositionsImpl ` do not need static



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei merged pull request #18682: [BugFix](functions) fix multi_search_all_positions

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei merged PR #18682:
URL: https://github.com/apache/doris/pull/18682


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee commented on a diff in pull request #18682: [BugFix](functions) fix multi_search_all_positions

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee commented on code in PR #18682:
URL: https://github.com/apache/doris/pull/18682#discussion_r1166659956


##########
be/src/vec/functions/functions_multi_string_position.cpp:
##########
@@ -113,38 +126,50 @@ struct FunctionMultiSearchAllPositionsImpl {
                                   const ColumnString::Offsets& haystack_offsets,
                                   const Array& needles_arr, PaddedPODArray<Int32>& vec_res,
                                   PaddedPODArray<UInt64>& offsets_res) {
-        if (needles_arr.size() > std::numeric_limits<UInt8>::max())
+        if (needles_arr.size() > std::numeric_limits<UInt8>::max()) {
             return Status::InvalidArgument(
                     "number of arguments for function {} doesn't match: "
                     "passed {}, should be at most 255",
                     name, needles_arr.size());
+        }
 
         std::vector<StringRef> needles;
         needles.reserve(needles_arr.size());
-        for (const auto& needle : needles_arr) needles.emplace_back(needle.get<StringRef>());
-
-        auto res_callback = [](const UInt8* start, const UInt8* end) -> Int32 {
-            return 1 + Impl::count_chars(reinterpret_cast<const char*>(start),
-                                         reinterpret_cast<const char*>(end));
-        };
-
-        auto searcher = Impl::create_multi_searcher(needles);
+        for (const auto& needle : needles_arr) {
+            needles.emplace_back(needle.get<StringRef>());
+        }
 
-        const size_t haystack_size = haystack_offsets.size();
         const size_t needles_size = needles.size();
+        std::vector<SingleSearcher> searchers;
+        searchers.reserve(needles_size);
+        for (auto needle : needles) {
+            searchers.emplace_back(needle.data, needle.size);

Review Comment:
   why do the work in up for-loop ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18682: [BugFix](functions) fix multi_search_all_positions

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18682:
URL: https://github.com/apache/doris/pull/18682#issuecomment-1510191089

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18682: [BugFix](functions) fix multi_search_all_positions

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18682:
URL: https://github.com/apache/doris/pull/18682#issuecomment-1510191082

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zclllyybb commented on pull request #18682: [BugFix](functions) fix multi_search_all_positions

Posted by "zclllyybb (via GitHub)" <gi...@apache.org>.
zclllyybb commented on PR #18682:
URL: https://github.com/apache/doris/pull/18682#issuecomment-1509762274

   run p1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18682: [BugFix](functions) fix multi_search_all_positions

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18682:
URL: https://github.com/apache/doris/pull/18682#issuecomment-1509611565

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] zclllyybb commented on pull request #18682: [BugFix](functions) fix multi_search_all_positions

Posted by "zclllyybb (via GitHub)" <gi...@apache.org>.
zclllyybb commented on PR #18682:
URL: https://github.com/apache/doris/pull/18682#issuecomment-1509610449

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org