You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "ZhangYu0123 (via GitHub)" <gi...@apache.org> on 2023/04/07 11:56:14 UTC

[GitHub] [doris] ZhangYu0123 opened a new pull request, #18474: [refactor](string) remove volnitsky search algorithm

ZhangYu0123 opened a new pull request, #18474:
URL: https://github.com/apache/doris/pull/18474

   # Proposed changes
   Remove volnitsky search algorithm.
   Use two chars string search via simd instead of volnitsky search algorithm thoroughly.
   
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Describe your changes.
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [ ] Has unit tests been added
   * [ ] Has document been added or modified
   * [ ] Does it need to update dependencies
   * [ ] Is this PR support rollback (If NO, please explain WHY)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on a diff in pull request #18474: [refactor](string) remove volnitsky search algorithm

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on code in PR #18474:
URL: https://github.com/apache/doris/pull/18474#discussion_r1160654277


##########
be/src/vec/common/string_searcher.h:
##########
@@ -400,4 +400,65 @@
     }
 };
 
+template <typename StringSearcher>
+class MultiStringSearcherBase {
+private:
+    /// needles and their offsets
+    const std::vector<StringRef>& needles;
+    /// searchers
+    std::vector<StringSearcher> searchers;
+    /// last index of offsets that was not processed
+    size_t last;
+
+public:
+    explicit MultiStringSearcherBase(const std::vector<StringRef>& needles_)

Review Comment:
   warning: use of undeclared identifier 'StringRef' [clang-diagnostic-error]
   ```cpp
       explicit MultiStringSearcherBase(const std::vector<StringRef>& needles_)
                                                          ^
   ```
   



##########
be/src/vec/common/string_searcher.h:
##########
@@ -400,4 +400,65 @@ struct LibCASCIICaseInsensitiveStringSearcher : public StringSearcherBase {
     }
 };
 
+template <typename StringSearcher>
+class MultiStringSearcherBase {
+private:
+    /// needles and their offsets
+    const std::vector<StringRef>& needles;

Review Comment:
   warning: use of undeclared identifier 'StringRef' [clang-diagnostic-error]
   ```cpp
       const std::vector<StringRef>& needles;
                         ^
   ```
   



##########
be/src/vec/common/string_searcher.h:
##########
@@ -400,4 +400,65 @@
     }
 };
 
+template <typename StringSearcher>
+class MultiStringSearcherBase {
+private:
+    /// needles and their offsets
+    const std::vector<StringRef>& needles;
+    /// searchers
+    std::vector<StringSearcher> searchers;
+    /// last index of offsets that was not processed
+    size_t last;
+
+public:
+    explicit MultiStringSearcherBase(const std::vector<StringRef>& needles_)
+            : needles {needles_}, last {0} {
+        searchers.reserve(needles.size());
+
+        size_t size = needles.size();
+        for (int i = 0; i < size; ++i) {
+            const char* cur_needle_data = needles[i].data;
+            const size_t cur_needle_size = needles[i].size;
+
+            searchers.emplace_back(cur_needle_data, cur_needle_size);
+        }
+    }
+
+    /**
+     * while (hasMoreToSearch())
+     * {
+     *     search inside the haystack with the known needles
+     * }
+     */
+    bool hasMoreToSearch() {
+        if (last >= needles.size()) return false;

Review Comment:
   warning: statement should be inside braces [readability-braces-around-statements]
   
   ```suggestion
           if (last >= needles.size()) { return false;
   }
   ```
   



##########
be/src/vec/common/string_searcher.h:
##########
@@ -400,4 +400,65 @@
     }
 };
 
+template <typename StringSearcher>
+class MultiStringSearcherBase {
+private:
+    /// needles and their offsets
+    const std::vector<StringRef>& needles;
+    /// searchers
+    std::vector<StringSearcher> searchers;
+    /// last index of offsets that was not processed
+    size_t last;
+
+public:
+    explicit MultiStringSearcherBase(const std::vector<StringRef>& needles_)
+            : needles {needles_}, last {0} {
+        searchers.reserve(needles.size());
+
+        size_t size = needles.size();
+        for (int i = 0; i < size; ++i) {
+            const char* cur_needle_data = needles[i].data;
+            const size_t cur_needle_size = needles[i].size;
+
+            searchers.emplace_back(cur_needle_data, cur_needle_size);
+        }
+    }
+
+    /**
+     * while (hasMoreToSearch())
+     * {
+     *     search inside the haystack with the known needles
+     * }
+     */
+    bool hasMoreToSearch() {
+        if (last >= needles.size()) return false;
+
+        return true;
+    }
+
+    bool searchOne(const uint8_t* haystack, const uint8_t* haystack_end) {
+        const size_t size = needles.size();
+        if (last >= size) return false;

Review Comment:
   warning: statement should be inside braces [readability-braces-around-statements]
   
   ```suggestion
           if (last >= size) { return false;
   }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18474: [refactor](string) remove volnitsky search algorithm

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18474:
URL: https://github.com/apache/doris/pull/18474#issuecomment-1500236711

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18474: [refactor](string) remove volnitsky search algorithm

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18474:
URL: https://github.com/apache/doris/pull/18474#issuecomment-1500791905

   clang-tidy review says "All clean, LGTM! :+1:"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] HappenLee merged pull request #18474: [refactor](string) remove volnitsky search algorithm

Posted by "HappenLee (via GitHub)" <gi...@apache.org>.
HappenLee merged PR #18474:
URL: https://github.com/apache/doris/pull/18474


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] ZhangYu0123 commented on pull request #18474: [refactor](string) remove volnitsky search algorithm

Posted by "ZhangYu0123 (via GitHub)" <gi...@apache.org>.
ZhangYu0123 commented on PR #18474:
URL: https://github.com/apache/doris/pull/18474#issuecomment-1500237842

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18474: [refactor](string) remove volnitsky search algorithm

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18474:
URL: https://github.com/apache/doris/pull/18474#issuecomment-1501325355

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] ZhangYu0123 commented on pull request #18474: [refactor](string) remove volnitsky search algorithm

Posted by "ZhangYu0123 (via GitHub)" <gi...@apache.org>.
ZhangYu0123 commented on PR #18474:
URL: https://github.com/apache/doris/pull/18474#issuecomment-1500792032

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #18474: [refactor](string) remove volnitsky search algorithm

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #18474:
URL: https://github.com/apache/doris/pull/18474#issuecomment-1501325377

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org