You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2021/02/24 15:36:57 UTC

[GitHub] [nifi-minifi-cpp] adamdebreceni opened a new pull request #1015: MINIFICPP-1509 - Use _stat64 on windows to handle large files

adamdebreceni opened a new pull request #1015:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1015


   Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
        in the commit message?
   
   - [ ] Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [ ] Has your PR been rebased against the latest commit within the target branch (typically main)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   ### For code changes:
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the LICENSE file?
   - [ ] If applicable, have you updated the NOTICE file?
   
   ### For documentation related changes:
   - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions CI results for build issues and submit an update to your PR as soon as possible.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi-minifi-cpp] adamdebreceni commented on a change in pull request #1015: MINIFICPP-1509 - Use _stat64 on windows to handle large files

Posted by GitBox <gi...@apache.org>.
adamdebreceni commented on a change in pull request #1015:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1015#discussion_r582667443



##########
File path: extensions/standard-processors/processors/GetFile.cpp
##########
@@ -228,42 +228,49 @@ void GetFile::pollListing(std::queue<std::string> &list, const GetFileRequest &r
 bool GetFile::acceptFile(std::string fullName, std::string name, const GetFileRequest &request) {
   logger_->log_trace("Checking file: %s", fullName);
 
+#ifdef WIN32
+  struct _stat64 statbuf;
+  if (_stat64(fullName.c_str(), &statbuf) != 0) {
+    return false;
+  }
+#else
   struct stat statbuf;
+  if (stat(fullName.c_str(), &statbuf) != 0) {
+    return false;
+  }
+#endif
+  uint64_t file_size = gsl::narrow<uint64_t>(statbuf.st_size);
+  uint64_t modifiedTime = gsl::narrow<uint64_t>(statbuf.st_mtime) * 1000;

Review comment:
       you are right about our inconsistent time representations (we use `uint64_t`, `int64_t` and `time_t` in `TimeUtil.h` to represent time) we should unify it, based on the POSIX and Windows documentations, as I understand we may (cautiously) assume that `st_mtime` yields the Unix time, so these narrowings and the subsequent scaling should be fine




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi-minifi-cpp] szaszm commented on a change in pull request #1015: MINIFICPP-1509 - Use _stat64 on windows to handle large files

Posted by GitBox <gi...@apache.org>.
szaszm commented on a change in pull request #1015:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1015#discussion_r590273675



##########
File path: extensions/standard-processors/processors/GetFile.cpp
##########
@@ -228,42 +228,48 @@ void GetFile::pollListing(std::queue<std::string> &list, const GetFileRequest &r
 bool GetFile::acceptFile(std::string fullName, std::string name, const GetFileRequest &request) {
   logger_->log_trace("Checking file: %s", fullName);
 
+#ifdef WIN32
+  struct _stat64 statbuf;
+  if (_stat64(fullName.c_str(), &statbuf) != 0) {
+    return false;
+  }
+#else
   struct stat statbuf;
+  if (stat(fullName.c_str(), &statbuf) != 0) {
+    return false;
+  }
+#endif

Review comment:
       I'd stay with 1




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi-minifi-cpp] arpadboda closed pull request #1015: MINIFICPP-1509 - Use _stat64 on windows to handle large files

Posted by GitBox <gi...@apache.org>.
arpadboda closed pull request #1015:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1015


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi-minifi-cpp] adamdebreceni commented on a change in pull request #1015: MINIFICPP-1509 - Use _stat64 on windows to handle large files

Posted by GitBox <gi...@apache.org>.
adamdebreceni commented on a change in pull request #1015:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1015#discussion_r582594812



##########
File path: extensions/standard-processors/processors/GetFile.cpp
##########
@@ -228,42 +228,48 @@ void GetFile::pollListing(std::queue<std::string> &list, const GetFileRequest &r
 bool GetFile::acceptFile(std::string fullName, std::string name, const GetFileRequest &request) {
   logger_->log_trace("Checking file: %s", fullName);
 
+#ifdef WIN32
+  struct _stat64 statbuf;
+  if (_stat64(fullName.c_str(), &statbuf) != 0) {
+    return false;
+  }
+#else
   struct stat statbuf;
+  if (stat(fullName.c_str(), &statbuf) != 0) {
+    return false;
+  }
+#endif
 
-  if (stat(fullName.c_str(), &statbuf) == 0) {
-    if (request.minSize > 0 && statbuf.st_size < (int32_t) request.minSize)
-      return false;
-
-    if (request.maxSize > 0 && statbuf.st_size > (int32_t) request.maxSize)
-      return false;
+  if (request.minSize > 0 && statbuf.st_size < request.minSize)

Review comment:
       added gsl::narrow casts

##########
File path: libminifi/include/utils/file/FileUtils.h
##########
@@ -248,18 +248,16 @@ inline uint64_t last_write_time(const std::string &path) {
   if (ec.value() == 0) {
     return result;
   }
-#else
-#ifdef WIN32
-  struct _stat result;
-  if (_stat(path.c_str(), &result) == 0) {
+#elif WIN32

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi-minifi-cpp] fgerlits commented on a change in pull request #1015: MINIFICPP-1509 - Use _stat64 on windows to handle large files

Posted by GitBox <gi...@apache.org>.
fgerlits commented on a change in pull request #1015:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1015#discussion_r582099946



##########
File path: libminifi/include/utils/file/FileUtils.h
##########
@@ -248,18 +248,16 @@ inline uint64_t last_write_time(const std::string &path) {
   if (ec.value() == 0) {
     return result;
   }
-#else
-#ifdef WIN32
-  struct _stat result;
-  if (_stat(path.c_str(), &result) == 0) {
+#elif WIN32

Review comment:
       I think this should be `#elif defined(WIN32)`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi-minifi-cpp] lordgamez commented on a change in pull request #1015: MINIFICPP-1509 - Use _stat64 on windows to handle large files

Posted by GitBox <gi...@apache.org>.
lordgamez commented on a change in pull request #1015:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1015#discussion_r582102407



##########
File path: extensions/standard-processors/processors/GetFile.cpp
##########
@@ -228,42 +228,48 @@ void GetFile::pollListing(std::queue<std::string> &list, const GetFileRequest &r
 bool GetFile::acceptFile(std::string fullName, std::string name, const GetFileRequest &request) {
   logger_->log_trace("Checking file: %s", fullName);
 
+#ifdef WIN32
+  struct _stat64 statbuf;
+  if (_stat64(fullName.c_str(), &statbuf) != 0) {
+    return false;
+  }
+#else
   struct stat statbuf;
+  if (stat(fullName.c_str(), &statbuf) != 0) {
+    return false;
+  }
+#endif
 
-  if (stat(fullName.c_str(), &statbuf) == 0) {
-    if (request.minSize > 0 && statbuf.st_size < (int32_t) request.minSize)
-      return false;
-
-    if (request.maxSize > 0 && statbuf.st_size > (int32_t) request.maxSize)
-      return false;
+  if (request.minSize > 0 && statbuf.st_size < request.minSize)

Review comment:
       Wouldn't removing the casts introduce sign-compare warnings?

##########
File path: libminifi/include/utils/file/FileUtils.h
##########
@@ -334,20 +332,28 @@ inline bool get_uid_gid(const std::string &path, uint64_t &uid, uint64_t &gid) {
 #endif
 
 inline bool is_directory(const char * path) {
+#ifndef WIN32
   struct stat dir_stat;
-  if (stat(path, &dir_stat) < 0) {
+  if (stat(path, &dir_stat) != 0) {
       return false;
   }
   return S_ISDIR(dir_stat.st_mode) != 0;
+#else
+  struct _stat64 dir_stat;
+  if (_stat64(path, &dir_stat) != 0) {
+      return false;
+  }
+  return S_ISDIR(dir_stat.st_mode) != 0;
+#endif
 }
 
 inline bool exists(const std::string& path) {
 #ifdef USE_BOOST
   return boost::filesystem::exists(path);
 #else
 #ifdef WIN32

Review comment:
       You could merge these as well, like you did in `last_write_time`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi-minifi-cpp] adamdebreceni commented on pull request #1015: MINIFICPP-1509 - Use _stat64 on windows to handle large files

Posted by GitBox <gi...@apache.org>.
adamdebreceni commented on pull request #1015:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1015#issuecomment-785173989


   Note: eventually we shouldn't include `<sys/stat.h>` but provide a platform-independent abstraction for it, `FileUtils.h` could be that abstraction but since some functions do not unambiguously signal a non-existent file (e.g. `file_size` returns 0 if the stat fails) currently it is not entirely viable to replace the explicit `stat` in `GetFile` or `PutFile` and possible other places


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi-minifi-cpp] adamdebreceni commented on a change in pull request #1015: MINIFICPP-1509 - Use _stat64 on windows to handle large files

Posted by GitBox <gi...@apache.org>.
adamdebreceni commented on a change in pull request #1015:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1015#discussion_r582093910



##########
File path: extensions/standard-processors/processors/GetFile.cpp
##########
@@ -228,42 +228,48 @@ void GetFile::pollListing(std::queue<std::string> &list, const GetFileRequest &r
 bool GetFile::acceptFile(std::string fullName, std::string name, const GetFileRequest &request) {
   logger_->log_trace("Checking file: %s", fullName);
 
+#ifdef WIN32
+  struct _stat64 statbuf;
+  if (_stat64(fullName.c_str(), &statbuf) != 0) {
+    return false;
+  }
+#else
   struct stat statbuf;
+  if (stat(fullName.c_str(), &statbuf) != 0) {
+    return false;
+  }
+#endif

Review comment:
       we could use `utils::file::exists` and then call `utils::file::file_size` and `utils::file::last_write_time` separately, my concern with this, is that we took one (supposedly) atomic `stat`-call and made three separate calls, on the other side, that would limit the number of places `stat` is explicitly called from, moreover if we trace where this `acceptFile` is called from (`file::list_dir`) we could even ignore the possibility of a non-existing file
   
   1. call `stat` explicitly (current)
   2. use 3 calls (exists, file_size, last_write_time)
   3. use 2 calls (file_size, last_write_time) because we don't care about non-existing files because a) this method is called with existing files b) we will handle them anyway downstream when we actually read it
   4. something else (make file_size, last_write_time return optionals, etc )




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi-minifi-cpp] adamdebreceni commented on a change in pull request #1015: MINIFICPP-1509 - Use _stat64 on windows to handle large files

Posted by GitBox <gi...@apache.org>.
adamdebreceni commented on a change in pull request #1015:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1015#discussion_r582594616



##########
File path: libminifi/include/utils/file/FileUtils.h
##########
@@ -334,20 +332,28 @@ inline bool get_uid_gid(const std::string &path, uint64_t &uid, uint64_t &gid) {
 #endif
 
 inline bool is_directory(const char * path) {
+#ifndef WIN32
   struct stat dir_stat;
-  if (stat(path, &dir_stat) < 0) {
+  if (stat(path, &dir_stat) != 0) {
       return false;
   }
   return S_ISDIR(dir_stat.st_mode) != 0;
+#else
+  struct _stat64 dir_stat;
+  if (_stat64(path, &dir_stat) != 0) {
+      return false;
+  }
+  return S_ISDIR(dir_stat.st_mode) != 0;
+#endif
 }
 
 inline bool exists(const std::string& path) {
 #ifdef USE_BOOST
   return boost::filesystem::exists(path);
 #else
 #ifdef WIN32

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi-minifi-cpp] hunyadi-dev commented on a change in pull request #1015: MINIFICPP-1509 - Use _stat64 on windows to handle large files

Posted by GitBox <gi...@apache.org>.
hunyadi-dev commented on a change in pull request #1015:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1015#discussion_r582616439



##########
File path: extensions/standard-processors/processors/GetFile.cpp
##########
@@ -228,42 +228,49 @@ void GetFile::pollListing(std::queue<std::string> &list, const GetFileRequest &r
 bool GetFile::acceptFile(std::string fullName, std::string name, const GetFileRequest &request) {
   logger_->log_trace("Checking file: %s", fullName);
 
+#ifdef WIN32
+  struct _stat64 statbuf;
+  if (_stat64(fullName.c_str(), &statbuf) != 0) {
+    return false;
+  }
+#else
   struct stat statbuf;
+  if (stat(fullName.c_str(), &statbuf) != 0) {
+    return false;
+  }
+#endif
+  uint64_t file_size = gsl::narrow<uint64_t>(statbuf.st_size);
+  uint64_t modifiedTime = gsl::narrow<uint64_t>(statbuf.st_mtime) * 1000;

Review comment:
       Is this safe (both for the narrowing before scaling and the type usage)? Shouldn't we use `std::difftime`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi-minifi-cpp] adamdebreceni removed a comment on pull request #1015: MINIFICPP-1509 - Use _stat64 on windows to handle large files

Posted by GitBox <gi...@apache.org>.
adamdebreceni removed a comment on pull request #1015:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1015#issuecomment-785173989


   Note: eventually we shouldn't include `<sys/stat.h>` but provide a platform-independent abstraction for it, `FileUtils.h` could be that abstraction but since some functions do not unambiguously signal a non-existent file (e.g. `file_size` returns 0 if the stat fails) currently it is not entirely viable to replace the explicit `stat` in `GetFile` or `PutFile` and possible other places


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org