You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2022/08/15 11:11:12 UTC

[GitHub] [nifi-minifi-cpp] adamdebreceni opened a new pull request, #1387: MINIFICPP-1903 - Take advantage of more optimal regex methods on supported platforms

adamdebreceni opened a new pull request, #1387:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1387

   Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
        in the commit message?
   
   - [ ] Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [ ] Has your PR been rebased against the latest commit within the target branch (typically main)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   ### For code changes:
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the LICENSE file?
   - [ ] If applicable, have you updated the NOTICE file?
   
   ### For documentation related changes:
   - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions CI results for build issues and submit an update to your PR as soon as possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] szaszm commented on a diff in pull request #1387: MINIFICPP-1903 - Take advantage of more optimal regex methods on supported platforms

Posted by GitBox <gi...@apache.org>.
szaszm commented on code in PR #1387:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1387#discussion_r974441805


##########
libminifi/src/utils/RegexUtils.cpp:
##########
@@ -279,13 +372,39 @@ SMatch getLastRegexMatch(const std::string& string, const utils::Regex& regex) {
     current_str = search_result.suffix();
   }
 
-  auto diff = string.size() - last_match.string_.size();
-  last_match.string_ = string;
+  if (!last_match.ready()) {
+    return last_match;
+  }
+
+  struct MatchInfo {
+    bool matched;
+    size_t begin;
+    size_t end;
+  };
+
+  // we must save the sub matches' info in a way that does
+  // not get invalidated by SMatch::reset, and can be transferred
+  // to the updated match
+  std::vector<MatchInfo> match_infos;
+  match_infos.reserve(last_match.size());
   for (auto& match : last_match.matches_) {
-    if (match.match.rm_so >= 0) {
-      match.match.rm_so += diff;
-      match.match.rm_eo += diff;
-    }
+    match_infos.emplace_back(MatchInfo{
+      .matched = match.matched,
+      .begin = gsl::narrow<size_t>(std::distance(last_match.string_.cbegin(), match.first)),
+      .end = gsl::narrow<size_t>(std::distance(last_match.string_.cbegin(), match.second))
+    });
+  }
+  // offset of the start of the last match into the original string
+  auto offset = string.size() - last_match.string_.size();
+  last_match.reset(string);
+  last_match.ready_ = true;
+  for (auto& info : match_infos) {
+    size_t match_off = info.matched ? offset : 0;
+    last_match.matches_.emplace_back(SMatch::Regmatch{

Review Comment:
   I would use push_back here for stronger type checking.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] adamdebreceni commented on a diff in pull request #1387: MINIFICPP-1903 - Take advantage of more optimal regex methods on supported platforms

Posted by GitBox <gi...@apache.org>.
adamdebreceni commented on code in PR #1387:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1387#discussion_r976180796


##########
libminifi/src/utils/RegexUtils.cpp:
##########
@@ -279,13 +372,39 @@ SMatch getLastRegexMatch(const std::string& string, const utils::Regex& regex) {
     current_str = search_result.suffix();
   }
 
-  auto diff = string.size() - last_match.string_.size();
-  last_match.string_ = string;
+  if (!last_match.ready()) {
+    return last_match;
+  }
+
+  struct MatchInfo {
+    bool matched;
+    size_t begin;
+    size_t end;
+  };
+
+  // we must save the sub matches' info in a way that does
+  // not get invalidated by SMatch::reset, and can be transferred
+  // to the updated match
+  std::vector<MatchInfo> match_infos;
+  match_infos.reserve(last_match.size());
   for (auto& match : last_match.matches_) {
-    if (match.match.rm_so >= 0) {
-      match.match.rm_so += diff;
-      match.match.rm_eo += diff;
-    }
+    match_infos.emplace_back(MatchInfo{
+      .matched = match.matched,
+      .begin = gsl::narrow<size_t>(std::distance(last_match.string_.cbegin(), match.first)),
+      .end = gsl::narrow<size_t>(std::distance(last_match.string_.cbegin(), match.second))
+    });
+  }
+  // offset of the start of the last match into the original string
+  auto offset = string.size() - last_match.string_.size();
+  last_match.reset(string);
+  last_match.ready_ = true;
+  for (auto& info : match_infos) {
+    size_t match_off = info.matched ? offset : 0;
+    last_match.matches_.emplace_back(SMatch::Regmatch{

Review Comment:
   @fgerlits noticed that clang-tidy advocated (or will advocate if the check is turned on) for using `emplace_back` instead of `push_back`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] adamdebreceni commented on a diff in pull request #1387: MINIFICPP-1903 - Take advantage of more optimal regex methods on supported platforms

Posted by GitBox <gi...@apache.org>.
adamdebreceni commented on code in PR #1387:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1387#discussion_r966924253


##########
libminifi/src/utils/RegexUtils.cpp:
##########
@@ -22,58 +22,86 @@
 #include <vector>
 
 #include "Exception.h"
+#include <regex.h>

Review Comment:
   removed



##########
libminifi/include/utils/RegexUtils.h:
##########
@@ -150,18 +132,32 @@ class Regex {
   int regex_mode_;
 #endif
 
-  friend bool regexMatch(const std::string &string, const Regex& regex);
-  friend bool regexMatch(const std::string &string, SMatch& match, const Regex& regex);
-  friend bool regexSearch(const std::string &string, const Regex& regex);
-  friend bool regexSearch(const std::string &string, SMatch& match, const Regex& regex);
+  friend bool regexMatch(const char* str, CMatch& match, const Regex& regex);
+  friend bool regexSearch(const char* str, CMatch& match, const Regex& regex);
+
+  friend bool regexMatch(const std::string_view& str, SVMatch& match, const Regex& regex);
+  friend bool regexSearch(const std::string_view& str, SVMatch& match, const Regex& regex);
+
+  friend bool regexMatch(const std::string& str, SMatch& match, const Regex& regex);
+  friend bool regexSearch(const std::string& str, SMatch& match, const Regex& regex);
+
   friend SMatch getLastRegexMatch(const std::string& string, const utils::Regex& regex);
 };
 
-bool regexMatch(const std::string &string, const Regex& regex);
-bool regexMatch(const std::string &string, SMatch& match, const Regex& regex);
+bool regexMatch(const char* str, const Regex& regex);
+bool regexMatch(const char* str, CMatch& match, const Regex& regex);
+bool regexSearch(const char* str, const Regex& regex);
+bool regexSearch(const char* str, CMatch& match, const Regex& regex);
+
+bool regexMatch(const std::string_view& str, const Regex& regex);
+bool regexMatch(const std::string_view& str, SVMatch& match, const Regex& regex);
+bool regexSearch(const std::string_view& str, const Regex& regex);
+bool regexSearch(const std::string_view& str, SVMatch& match, const Regex& regex);

Review Comment:
   added tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] fgerlits commented on a diff in pull request #1387: MINIFICPP-1903 - Take advantage of more optimal regex methods on supported platforms

Posted by GitBox <gi...@apache.org>.
fgerlits commented on code in PR #1387:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1387#discussion_r964051600


##########
libminifi/src/utils/RegexUtils.cpp:
##########
@@ -22,58 +22,86 @@
 #include <vector>
 
 #include "Exception.h"
+#include <regex.h>
+
+namespace org::apache::nifi::minifi::utils {
 
 #ifndef NO_MORE_REGFREEE
-namespace {
 
-std::size_t getMaxGroupCountOfRegex(const std::string& regex) {
-  return std::count(regex.begin(), regex.end(), '(') + 1;
+SMatch::SMatch(const SMatch& other) {
+  *this = other;
 }
 
-}  // namespace
-#endif
-
-namespace org::apache::nifi::minifi::utils {
+SMatch::SMatch(SMatch&& other) {
+  *this = std::move(other);
+}
 
-#ifndef NO_MORE_REGFREEE
-SMatch::SuffixWrapper SMatch::suffix() const {
-  if ((size_t) matches_[0].match.rm_eo >= string_.size()) {
-    return SuffixWrapper{std::string()};
-  } else {
-    return SuffixWrapper{string_.substr(matches_[0].match.rm_eo)};
+SMatch& SMatch::operator=(const SMatch& other) {
+  if (this == &other) {
+    return *this;
+  }
+  reset(other.string_);
+  matches_.reserve(other.matches_.size());
+  ready_ = other.ready_;
+  for (const auto& sub_match : other.matches_) {
+    size_t begin_off = gsl::narrow<size_t>(std::distance(other.string_.begin(), sub_match.first));
+    size_t end_off = gsl::narrow<size_t>(std::distance(other.string_.begin(), sub_match.second));
+    matches_.push_back(Regmatch{sub_match.matched, string_.begin() + begin_off, string_.begin() + end_off});
   }
+  return *this;
+}
+
+SMatch& SMatch::operator=(SMatch&& other) {
+  // trigger the copy assignment, we could optimize this (by moving the string/matches)
+  // but we would need to maintain a separate offsets vector, as after the move the original
+  // sub_matches' iterators are invalidated, if this turns out to be a performance bottleneck
+  // revisit this
+  return *this = other;
+}

Review Comment:
   This move had to be changed to a copy because we are storing iterators instead of offsets now.  What is the advantage of storing iterators?  Could we go back to storing offsets?



##########
libminifi/src/utils/RegexUtils.cpp:
##########
@@ -22,58 +22,86 @@
 #include <vector>
 
 #include "Exception.h"
+#include <regex.h>

Review Comment:
   Is this `#include` necessary?  `regex.h` is already included in the header in the non-standard-lib case; also both MSVC and the linter dislike it.



##########
libminifi/include/utils/RegexUtils.h:
##########
@@ -150,18 +132,32 @@ class Regex {
   int regex_mode_;
 #endif
 
-  friend bool regexMatch(const std::string &string, const Regex& regex);
-  friend bool regexMatch(const std::string &string, SMatch& match, const Regex& regex);
-  friend bool regexSearch(const std::string &string, const Regex& regex);
-  friend bool regexSearch(const std::string &string, SMatch& match, const Regex& regex);
+  friend bool regexMatch(const char* str, CMatch& match, const Regex& regex);
+  friend bool regexSearch(const char* str, CMatch& match, const Regex& regex);
+
+  friend bool regexMatch(const std::string_view& str, SVMatch& match, const Regex& regex);
+  friend bool regexSearch(const std::string_view& str, SVMatch& match, const Regex& regex);
+
+  friend bool regexMatch(const std::string& str, SMatch& match, const Regex& regex);
+  friend bool regexSearch(const std::string& str, SMatch& match, const Regex& regex);
+
   friend SMatch getLastRegexMatch(const std::string& string, const utils::Regex& regex);
 };
 
-bool regexMatch(const std::string &string, const Regex& regex);
-bool regexMatch(const std::string &string, SMatch& match, const Regex& regex);
+bool regexMatch(const char* str, const Regex& regex);
+bool regexMatch(const char* str, CMatch& match, const Regex& regex);
+bool regexSearch(const char* str, const Regex& regex);
+bool regexSearch(const char* str, CMatch& match, const Regex& regex);
+
+bool regexMatch(const std::string_view& str, const Regex& regex);
+bool regexMatch(const std::string_view& str, SVMatch& match, const Regex& regex);
+bool regexSearch(const std::string_view& str, const Regex& regex);
+bool regexSearch(const std::string_view& str, SVMatch& match, const Regex& regex);

Review Comment:
   Can you add some unit tests for these new overloads, please?



##########
libminifi/src/utils/RegexUtils.cpp:
##########
@@ -22,58 +22,86 @@
 #include <vector>
 
 #include "Exception.h"
+#include <regex.h>
+
+namespace org::apache::nifi::minifi::utils {
 
 #ifndef NO_MORE_REGFREEE
-namespace {
 
-std::size_t getMaxGroupCountOfRegex(const std::string& regex) {
-  return std::count(regex.begin(), regex.end(), '(') + 1;
+SMatch::SMatch(const SMatch& other) {
+  *this = other;
 }
 
-}  // namespace
-#endif
-
-namespace org::apache::nifi::minifi::utils {
+SMatch::SMatch(SMatch&& other) {
+  *this = std::move(other);
+}
 
-#ifndef NO_MORE_REGFREEE
-SMatch::SuffixWrapper SMatch::suffix() const {
-  if ((size_t) matches_[0].match.rm_eo >= string_.size()) {
-    return SuffixWrapper{std::string()};
-  } else {
-    return SuffixWrapper{string_.substr(matches_[0].match.rm_eo)};
+SMatch& SMatch::operator=(const SMatch& other) {
+  if (this == &other) {
+    return *this;
+  }
+  reset(other.string_);
+  matches_.reserve(other.matches_.size());
+  ready_ = other.ready_;
+  for (const auto& sub_match : other.matches_) {
+    size_t begin_off = gsl::narrow<size_t>(std::distance(other.string_.begin(), sub_match.first));
+    size_t end_off = gsl::narrow<size_t>(std::distance(other.string_.begin(), sub_match.second));
+    matches_.push_back(Regmatch{sub_match.matched, string_.begin() + begin_off, string_.begin() + end_off});

Review Comment:
   `clang-tidy` wants this changed to `emplace_back`, and although it looks like this check is not turned on yet, it will be eventually, so let's change it (also in a few other places in this file)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] adamdebreceni commented on a diff in pull request #1387: MINIFICPP-1903 - Take advantage of more optimal regex methods on supported platforms

Posted by GitBox <gi...@apache.org>.
adamdebreceni commented on code in PR #1387:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1387#discussion_r966812547


##########
libminifi/src/utils/RegexUtils.cpp:
##########
@@ -22,58 +22,86 @@
 #include <vector>
 
 #include "Exception.h"
+#include <regex.h>
+
+namespace org::apache::nifi::minifi::utils {
 
 #ifndef NO_MORE_REGFREEE
-namespace {
 
-std::size_t getMaxGroupCountOfRegex(const std::string& regex) {
-  return std::count(regex.begin(), regex.end(), '(') + 1;
+SMatch::SMatch(const SMatch& other) {
+  *this = other;
 }
 
-}  // namespace
-#endif
-
-namespace org::apache::nifi::minifi::utils {
+SMatch::SMatch(SMatch&& other) {
+  *this = std::move(other);
+}
 
-#ifndef NO_MORE_REGFREEE
-SMatch::SuffixWrapper SMatch::suffix() const {
-  if ((size_t) matches_[0].match.rm_eo >= string_.size()) {
-    return SuffixWrapper{std::string()};
-  } else {
-    return SuffixWrapper{string_.substr(matches_[0].match.rm_eo)};
+SMatch& SMatch::operator=(const SMatch& other) {
+  if (this == &other) {
+    return *this;
+  }
+  reset(other.string_);
+  matches_.reserve(other.matches_.size());
+  ready_ = other.ready_;
+  for (const auto& sub_match : other.matches_) {
+    size_t begin_off = gsl::narrow<size_t>(std::distance(other.string_.begin(), sub_match.first));
+    size_t end_off = gsl::narrow<size_t>(std::distance(other.string_.begin(), sub_match.second));
+    matches_.push_back(Regmatch{sub_match.matched, string_.begin() + begin_off, string_.begin() + end_off});
   }
+  return *this;
+}
+
+SMatch& SMatch::operator=(SMatch&& other) {
+  // trigger the copy assignment, we could optimize this (by moving the string/matches)
+  // but we would need to maintain a separate offsets vector, as after the move the original
+  // sub_matches' iterators are invalidated, if this turns out to be a performance bottleneck
+  // revisit this
+  return *this = other;
+}

Review Comment:
   an `SMatch` object would still have to contain a reference to the stored string, which would need to be updated on copy/move



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] szaszm closed pull request #1387: MINIFICPP-1903 - Take advantage of more optimal regex methods on supported platforms

Posted by GitBox <gi...@apache.org>.
szaszm closed pull request #1387: MINIFICPP-1903 - Take advantage of more optimal regex methods on supported platforms
URL: https://github.com/apache/nifi-minifi-cpp/pull/1387


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] adamdebreceni commented on a diff in pull request #1387: MINIFICPP-1903 - Take advantage of more optimal regex methods on supported platforms

Posted by GitBox <gi...@apache.org>.
adamdebreceni commented on code in PR #1387:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1387#discussion_r966812547


##########
libminifi/src/utils/RegexUtils.cpp:
##########
@@ -22,58 +22,86 @@
 #include <vector>
 
 #include "Exception.h"
+#include <regex.h>
+
+namespace org::apache::nifi::minifi::utils {
 
 #ifndef NO_MORE_REGFREEE
-namespace {
 
-std::size_t getMaxGroupCountOfRegex(const std::string& regex) {
-  return std::count(regex.begin(), regex.end(), '(') + 1;
+SMatch::SMatch(const SMatch& other) {
+  *this = other;
 }
 
-}  // namespace
-#endif
-
-namespace org::apache::nifi::minifi::utils {
+SMatch::SMatch(SMatch&& other) {
+  *this = std::move(other);
+}
 
-#ifndef NO_MORE_REGFREEE
-SMatch::SuffixWrapper SMatch::suffix() const {
-  if ((size_t) matches_[0].match.rm_eo >= string_.size()) {
-    return SuffixWrapper{std::string()};
-  } else {
-    return SuffixWrapper{string_.substr(matches_[0].match.rm_eo)};
+SMatch& SMatch::operator=(const SMatch& other) {
+  if (this == &other) {
+    return *this;
+  }
+  reset(other.string_);
+  matches_.reserve(other.matches_.size());
+  ready_ = other.ready_;
+  for (const auto& sub_match : other.matches_) {
+    size_t begin_off = gsl::narrow<size_t>(std::distance(other.string_.begin(), sub_match.first));
+    size_t end_off = gsl::narrow<size_t>(std::distance(other.string_.begin(), sub_match.second));
+    matches_.push_back(Regmatch{sub_match.matched, string_.begin() + begin_off, string_.begin() + end_off});
   }
+  return *this;
+}
+
+SMatch& SMatch::operator=(SMatch&& other) {
+  // trigger the copy assignment, we could optimize this (by moving the string/matches)
+  // but we would need to maintain a separate offsets vector, as after the move the original
+  // sub_matches' iterators are invalidated, if this turns out to be a performance bottleneck
+  // revisit this
+  return *this = other;
+}

Review Comment:
   ~an `SMatch`~ a `Regmatch` object would still have to contain a reference to the stored string, which would need to be updated on copy/move
   
   Edit: sorry, not the `SMatch` but the `Regmatch` object has to maintain a reference to the string stored in the owner `SMatch`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] adamdebreceni commented on a diff in pull request #1387: MINIFICPP-1903 - Take advantage of more optimal regex methods on supported platforms

Posted by GitBox <gi...@apache.org>.
adamdebreceni commented on code in PR #1387:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1387#discussion_r966924796


##########
libminifi/src/utils/RegexUtils.cpp:
##########
@@ -22,58 +22,86 @@
 #include <vector>
 
 #include "Exception.h"
+#include <regex.h>
+
+namespace org::apache::nifi::minifi::utils {
 
 #ifndef NO_MORE_REGFREEE
-namespace {
 
-std::size_t getMaxGroupCountOfRegex(const std::string& regex) {
-  return std::count(regex.begin(), regex.end(), '(') + 1;
+SMatch::SMatch(const SMatch& other) {
+  *this = other;
 }
 
-}  // namespace
-#endif
-
-namespace org::apache::nifi::minifi::utils {
+SMatch::SMatch(SMatch&& other) {
+  *this = std::move(other);
+}
 
-#ifndef NO_MORE_REGFREEE
-SMatch::SuffixWrapper SMatch::suffix() const {
-  if ((size_t) matches_[0].match.rm_eo >= string_.size()) {
-    return SuffixWrapper{std::string()};
-  } else {
-    return SuffixWrapper{string_.substr(matches_[0].match.rm_eo)};
+SMatch& SMatch::operator=(const SMatch& other) {
+  if (this == &other) {
+    return *this;
+  }
+  reset(other.string_);
+  matches_.reserve(other.matches_.size());
+  ready_ = other.ready_;
+  for (const auto& sub_match : other.matches_) {
+    size_t begin_off = gsl::narrow<size_t>(std::distance(other.string_.begin(), sub_match.first));
+    size_t end_off = gsl::narrow<size_t>(std::distance(other.string_.begin(), sub_match.second));
+    matches_.push_back(Regmatch{sub_match.matched, string_.begin() + begin_off, string_.begin() + end_off});

Review Comment:
   changed these to `emplace_back`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] adamdebreceni commented on a diff in pull request #1387: MINIFICPP-1903 - Take advantage of more optimal regex methods on supported platforms

Posted by GitBox <gi...@apache.org>.
adamdebreceni commented on code in PR #1387:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1387#discussion_r966864057


##########
libminifi/src/utils/RegexUtils.cpp:
##########
@@ -22,58 +22,86 @@
 #include <vector>
 
 #include "Exception.h"
+#include <regex.h>
+
+namespace org::apache::nifi::minifi::utils {
 
 #ifndef NO_MORE_REGFREEE
-namespace {
 
-std::size_t getMaxGroupCountOfRegex(const std::string& regex) {
-  return std::count(regex.begin(), regex.end(), '(') + 1;
+SMatch::SMatch(const SMatch& other) {
+  *this = other;
 }
 
-}  // namespace
-#endif
-
-namespace org::apache::nifi::minifi::utils {
+SMatch::SMatch(SMatch&& other) {
+  *this = std::move(other);
+}
 
-#ifndef NO_MORE_REGFREEE
-SMatch::SuffixWrapper SMatch::suffix() const {
-  if ((size_t) matches_[0].match.rm_eo >= string_.size()) {
-    return SuffixWrapper{std::string()};
-  } else {
-    return SuffixWrapper{string_.substr(matches_[0].match.rm_eo)};
+SMatch& SMatch::operator=(const SMatch& other) {
+  if (this == &other) {
+    return *this;
+  }
+  reset(other.string_);
+  matches_.reserve(other.matches_.size());
+  ready_ = other.ready_;
+  for (const auto& sub_match : other.matches_) {
+    size_t begin_off = gsl::narrow<size_t>(std::distance(other.string_.begin(), sub_match.first));
+    size_t end_off = gsl::narrow<size_t>(std::distance(other.string_.begin(), sub_match.second));
+    matches_.push_back(Regmatch{sub_match.matched, string_.begin() + begin_off, string_.begin() + end_off});
   }
+  return *this;
+}
+
+SMatch& SMatch::operator=(SMatch&& other) {
+  // trigger the copy assignment, we could optimize this (by moving the string/matches)
+  // but we would need to maintain a separate offsets vector, as after the move the original
+  // sub_matches' iterators are invalidated, if this turns out to be a performance bottleneck
+  // revisit this
+  return *this = other;
+}

Review Comment:
   sorry, not the `SMatch` but the `Regmatch` object has to maintain a reference to the string stored in the owner `SMatch`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org