You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2022/04/20 12:10:17 UTC

[GitHub] [nifi-minifi-cpp] adamdebreceni opened a new pull request, #1310: MINIFICPP-1806 - Use boyer_moore for extension verification

adamdebreceni opened a new pull request, #1310:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1310

   Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
        in the commit message?
   
   - [ ] Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [ ] Has your PR been rebased against the latest commit within the target branch (typically main)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   ### For code changes:
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the LICENSE file?
   - [ ] If applicable, have you updated the NOTICE file?
   
   ### For documentation related changes:
   - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions CI results for build issues and submit an update to your PR as soon as possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] szaszm closed pull request #1310: MINIFICPP-1806 - Use boyer_moore for extension verification

Posted by GitBox <gi...@apache.org>.
szaszm closed pull request #1310: MINIFICPP-1806 - Use boyer_moore for extension verification
URL: https://github.com/apache/nifi-minifi-cpp/pull/1310


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] fgerlits commented on a diff in pull request #1310: MINIFICPP-1806 - Use boyer_moore for extension verification

Posted by GitBox <gi...@apache.org>.
fgerlits commented on code in PR #1310:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1310#discussion_r871348747


##########
libminifi/src/utils/file/FileUtils.cpp:
##########
@@ -49,41 +52,21 @@ uint64_t computeChecksum(const std::string &file_name, uint64_t up_to_position)
 }
 
 bool contains(const std::filesystem::path& file_path, std::string_view text_to_search) {
-  gsl_Expects(text_to_search.size() <= 8192);
+  gsl_Expects(text_to_search.size() <= 8_KiB);
   gsl_ExpectsAudit(std::filesystem::exists(file_path));
-  std::array<char, 8192> buf1{};
-  std::array<char, 8192> buf2{};
-  gsl::span<char> left = buf1;
-  gsl::span<char> right = buf2;
-
-  const auto charat = [&](size_t idx) {
-    if (idx < left.size()) {
-      return left[idx];
-    } else if (idx < left.size() + right.size()) {
-      return right[idx - left.size()];
-    } else {
-      return '\0';
-    }
-  };
-  const auto check_range = [&](size_t start, size_t end) -> size_t {
-    for (size_t i = start; i < end; ++i) {
-      size_t j{};
-      for (j = 0; j < text_to_search.size(); ++j) {
-        if (charat(i + j) != text_to_search[j]) break;
-      }
-      if (j == text_to_search.size()) return true;
-    }
-    return false;
-  };
+  std::array<char, 16_KiB> buf{};
+
+  Searcher searcher(text_to_search.begin(), text_to_search.end());
 
   std::ifstream ifs{file_path, std::ios::binary};
-  ifs.read(right.data(), gsl::narrow<std::streamsize>(right.size()));
   do {
-    std::swap(left, right);
-    ifs.read(right.data(), gsl::narrow<std::streamsize>(right.size()));
-    if (check_range(0, left.size())) return true;
+    std::copy(buf.end() - text_to_search.size(), buf.end(), buf.begin());
+    ifs.read(buf.data() + text_to_search.size(), buf.size() - text_to_search.size());
+    if (std::search(buf.begin(), buf.end(), searcher) != buf.end()) {
+      return true;
+    }
   } while (ifs);
-  return check_range(left.size(), left.size() + right.size());
+  return std::search(buf.begin(), buf.end(), searcher) != buf.end();

Review Comment:
   After the last chunk is read, if it is shorter than the capacity of the buffer, `buf` will contain bytes remaining from the previous chunk at the end.  Is it possible that e.g `text_to_search` is `"abcde"`, and it is found incorrectly (false positive) because the end of the file is `"abc"`, and it is followed by `"de"` in the buffer because that happened to be there from the previous chunk?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] adamdebreceni commented on a diff in pull request #1310: MINIFICPP-1806 - Use boyer_moore for extension verification

Posted by GitBox <gi...@apache.org>.
adamdebreceni commented on code in PR #1310:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1310#discussion_r872145717


##########
libminifi/src/utils/file/FileUtils.cpp:
##########
@@ -49,41 +52,21 @@ uint64_t computeChecksum(const std::string &file_name, uint64_t up_to_position)
 }
 
 bool contains(const std::filesystem::path& file_path, std::string_view text_to_search) {
-  gsl_Expects(text_to_search.size() <= 8192);
+  gsl_Expects(text_to_search.size() <= 8_KiB);
   gsl_ExpectsAudit(std::filesystem::exists(file_path));
-  std::array<char, 8192> buf1{};
-  std::array<char, 8192> buf2{};
-  gsl::span<char> left = buf1;
-  gsl::span<char> right = buf2;
-
-  const auto charat = [&](size_t idx) {
-    if (idx < left.size()) {
-      return left[idx];
-    } else if (idx < left.size() + right.size()) {
-      return right[idx - left.size()];
-    } else {
-      return '\0';
-    }
-  };
-  const auto check_range = [&](size_t start, size_t end) -> size_t {
-    for (size_t i = start; i < end; ++i) {
-      size_t j{};
-      for (j = 0; j < text_to_search.size(); ++j) {
-        if (charat(i + j) != text_to_search[j]) break;
-      }
-      if (j == text_to_search.size()) return true;
-    }
-    return false;
-  };
+  std::array<char, 16_KiB> buf{};
+
+  Searcher searcher(text_to_search.begin(), text_to_search.end());
 
   std::ifstream ifs{file_path, std::ios::binary};
-  ifs.read(right.data(), gsl::narrow<std::streamsize>(right.size()));
   do {
-    std::swap(left, right);
-    ifs.read(right.data(), gsl::narrow<std::streamsize>(right.size()));
-    if (check_range(0, left.size())) return true;
+    std::copy(buf.end() - text_to_search.size(), buf.end(), buf.begin());
+    ifs.read(buf.data() + text_to_search.size(), buf.size() - text_to_search.size());
+    if (std::search(buf.begin(), buf.end(), searcher) != buf.end()) {
+      return true;
+    }
   } while (ifs);
-  return check_range(left.size(), left.size() + right.size());
+  return std::search(buf.begin(), buf.end(), searcher) != buf.end();

Review Comment:
   indeed it could result in false positives, updated it in [18e826e](https://github.com/apache/nifi-minifi-cpp/pull/1310/commits/18e826eb034124eacfe13edc90f588e1f6c0589a)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] fgerlits commented on a diff in pull request #1310: MINIFICPP-1806 - Use boyer_moore for extension verification

Posted by GitBox <gi...@apache.org>.
fgerlits commented on code in PR #1310:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1310#discussion_r872195680


##########
libminifi/src/utils/file/FileUtils.cpp:
##########
@@ -49,41 +52,23 @@ uint64_t computeChecksum(const std::string &file_name, uint64_t up_to_position)
 }
 
 bool contains(const std::filesystem::path& file_path, std::string_view text_to_search) {
-  gsl_Expects(text_to_search.size() <= 8192);
+  gsl_Expects(text_to_search.size() <= 8_KiB);
   gsl_ExpectsAudit(std::filesystem::exists(file_path));
-  std::array<char, 8192> buf1{};
-  std::array<char, 8192> buf2{};
-  gsl::span<char> left = buf1;
-  gsl::span<char> right = buf2;
-
-  const auto charat = [&](size_t idx) {
-    if (idx < left.size()) {
-      return left[idx];
-    } else if (idx < left.size() + right.size()) {
-      return right[idx - left.size()];
-    } else {
-      return '\0';
-    }
-  };
-  const auto check_range = [&](size_t start, size_t end) -> size_t {
-    for (size_t i = start; i < end; ++i) {
-      size_t j{};
-      for (j = 0; j < text_to_search.size(); ++j) {
-        if (charat(i + j) != text_to_search[j]) break;
-      }
-      if (j == text_to_search.size()) return true;
-    }
-    return false;
-  };
+  std::array<char, 16_KiB> buf{};
+  gsl::span<char> view;

Review Comment:
   OK, I see there is a Jira for this already: https://issues.apache.org/jira/browse/MINIFICPP-1755.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] lordgamez commented on pull request #1310: MINIFICPP-1806 - Use boyer_moore for extension verification

Posted by GitBox <gi...@apache.org>.
lordgamez commented on PR #1310:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1310#issuecomment-1105144011

   I think the clang job should be investigated, the FileUtilTests failure might relate to this change:
   
   ```
   double free or corruption (!prev)
   
   
   99% tests passed, 1 tests failed out of 183
   
   Total Test time (real) = 300.55 sec
   
   The following tests FAILED:
   	 27 - FileUtilsTests (Timeout)
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] adamdebreceni commented on pull request #1310: MINIFICPP-1806 - Use boyer_moore for extension verification

Posted by GitBox <gi...@apache.org>.
adamdebreceni commented on PR #1310:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1310#issuecomment-1105155136

   > I think the clang job should be investigated, the FileUtilTests failure might relate to this change:
   > 
   > ```
   > double free or corruption (!prev)
   > 
   > 
   > 99% tests passed, 1 tests failed out of 183
   > 
   > Total Test time (real) = 300.55 sec
   > 
   > The following tests FAILED:
   > 	 27 - FileUtilsTests (Timeout)
   > ```
   
   yes it is currently being investigated, I will move it to draft until it is sorted out


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] adamdebreceni commented on a diff in pull request #1310: MINIFICPP-1806 - Use boyer_moore for extension verification

Posted by GitBox <gi...@apache.org>.
adamdebreceni commented on code in PR #1310:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1310#discussion_r872170800


##########
libminifi/src/utils/file/FileUtils.cpp:
##########
@@ -49,41 +52,23 @@ uint64_t computeChecksum(const std::string &file_name, uint64_t up_to_position)
 }
 
 bool contains(const std::filesystem::path& file_path, std::string_view text_to_search) {
-  gsl_Expects(text_to_search.size() <= 8192);
+  gsl_Expects(text_to_search.size() <= 8_KiB);
   gsl_ExpectsAudit(std::filesystem::exists(file_path));
-  std::array<char, 8192> buf1{};
-  std::array<char, 8192> buf2{};
-  gsl::span<char> left = buf1;
-  gsl::span<char> right = buf2;
-
-  const auto charat = [&](size_t idx) {
-    if (idx < left.size()) {
-      return left[idx];
-    } else if (idx < left.size() + right.size()) {
-      return right[idx - left.size()];
-    } else {
-      return '\0';
-    }
-  };
-  const auto check_range = [&](size_t start, size_t end) -> size_t {
-    for (size_t i = start; i < end; ++i) {
-      size_t j{};
-      for (j = 0; j < text_to_search.size(); ++j) {
-        if (charat(i + j) != text_to_search[j]) break;
-      }
-      if (j == text_to_search.size()) return true;
-    }
-    return false;
-  };
+  std::array<char, 16_KiB> buf{};
+  gsl::span<char> view;

Review Comment:
   it is not yet used in our codebase, if there is no technical reason behind this, I think we should dedicate a separate PR to replace all `gsl::span` with `std::span`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] fgerlits commented on a diff in pull request #1310: MINIFICPP-1806 - Use boyer_moore for extension verification

Posted by GitBox <gi...@apache.org>.
fgerlits commented on code in PR #1310:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1310#discussion_r872161738


##########
libminifi/src/utils/file/FileUtils.cpp:
##########
@@ -49,41 +52,23 @@ uint64_t computeChecksum(const std::string &file_name, uint64_t up_to_position)
 }
 
 bool contains(const std::filesystem::path& file_path, std::string_view text_to_search) {
-  gsl_Expects(text_to_search.size() <= 8192);
+  gsl_Expects(text_to_search.size() <= 8_KiB);
   gsl_ExpectsAudit(std::filesystem::exists(file_path));
-  std::array<char, 8192> buf1{};
-  std::array<char, 8192> buf2{};
-  gsl::span<char> left = buf1;
-  gsl::span<char> right = buf2;
-
-  const auto charat = [&](size_t idx) {
-    if (idx < left.size()) {
-      return left[idx];
-    } else if (idx < left.size() + right.size()) {
-      return right[idx - left.size()];
-    } else {
-      return '\0';
-    }
-  };
-  const auto check_range = [&](size_t start, size_t end) -> size_t {
-    for (size_t i = start; i < end; ++i) {
-      size_t j{};
-      for (j = 0; j < text_to_search.size(); ++j) {
-        if (charat(i + j) != text_to_search[j]) break;
-      }
-      if (j == text_to_search.size()) return true;
-    }
-    return false;
-  };
+  std::array<char, 16_KiB> buf{};
+  gsl::span<char> view;

Review Comment:
   we have `std::span` in C++20; or is that not supported on some of our platforms, yet?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi-minifi-cpp] fgerlits commented on a diff in pull request #1310: MINIFICPP-1806 - Use boyer_moore for extension verification

Posted by GitBox <gi...@apache.org>.
fgerlits commented on code in PR #1310:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1310#discussion_r871348747


##########
libminifi/src/utils/file/FileUtils.cpp:
##########
@@ -49,41 +52,21 @@ uint64_t computeChecksum(const std::string &file_name, uint64_t up_to_position)
 }
 
 bool contains(const std::filesystem::path& file_path, std::string_view text_to_search) {
-  gsl_Expects(text_to_search.size() <= 8192);
+  gsl_Expects(text_to_search.size() <= 8_KiB);
   gsl_ExpectsAudit(std::filesystem::exists(file_path));
-  std::array<char, 8192> buf1{};
-  std::array<char, 8192> buf2{};
-  gsl::span<char> left = buf1;
-  gsl::span<char> right = buf2;
-
-  const auto charat = [&](size_t idx) {
-    if (idx < left.size()) {
-      return left[idx];
-    } else if (idx < left.size() + right.size()) {
-      return right[idx - left.size()];
-    } else {
-      return '\0';
-    }
-  };
-  const auto check_range = [&](size_t start, size_t end) -> size_t {
-    for (size_t i = start; i < end; ++i) {
-      size_t j{};
-      for (j = 0; j < text_to_search.size(); ++j) {
-        if (charat(i + j) != text_to_search[j]) break;
-      }
-      if (j == text_to_search.size()) return true;
-    }
-    return false;
-  };
+  std::array<char, 16_KiB> buf{};
+
+  Searcher searcher(text_to_search.begin(), text_to_search.end());
 
   std::ifstream ifs{file_path, std::ios::binary};
-  ifs.read(right.data(), gsl::narrow<std::streamsize>(right.size()));
   do {
-    std::swap(left, right);
-    ifs.read(right.data(), gsl::narrow<std::streamsize>(right.size()));
-    if (check_range(0, left.size())) return true;
+    std::copy(buf.end() - text_to_search.size(), buf.end(), buf.begin());
+    ifs.read(buf.data() + text_to_search.size(), buf.size() - text_to_search.size());
+    if (std::search(buf.begin(), buf.end(), searcher) != buf.end()) {
+      return true;
+    }
   } while (ifs);
-  return check_range(left.size(), left.size() + right.size());
+  return std::search(buf.begin(), buf.end(), searcher) != buf.end();

Review Comment:
   After the last chunk is read, if it is shorter than the capacity of the buffer, `buf` will contain bytes remaining from the previous chunk at the end.  Is it possible that e.g `text_to_search` is `"abcde"`, and it is found incorrectly because the end of the file is `"abc"`, and it is followed by `"de"` in the buffer because that happened to be there from the previous chunk?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org