You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2021/02/24 16:12:29 UTC

[GitHub] [nifi-minifi-cpp] adamdebreceni commented on a change in pull request #1015: MINIFICPP-1509 - Use _stat64 on windows to handle large files

adamdebreceni commented on a change in pull request #1015:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1015#discussion_r582093910



##########
File path: extensions/standard-processors/processors/GetFile.cpp
##########
@@ -228,42 +228,48 @@ void GetFile::pollListing(std::queue<std::string> &list, const GetFileRequest &r
 bool GetFile::acceptFile(std::string fullName, std::string name, const GetFileRequest &request) {
   logger_->log_trace("Checking file: %s", fullName);
 
+#ifdef WIN32
+  struct _stat64 statbuf;
+  if (_stat64(fullName.c_str(), &statbuf) != 0) {
+    return false;
+  }
+#else
   struct stat statbuf;
+  if (stat(fullName.c_str(), &statbuf) != 0) {
+    return false;
+  }
+#endif

Review comment:
       we could use `utils::file::exists` and then call `utils::file::file_size` and `utils::file::last_write_time` separately, my concern with this, is that we took one (supposedly) atomic `stat`-call and made three separate calls, on the other side, that would limit the number of places `stat` is explicitly called from, moreover if we trace where this `acceptFile` is called from (`file::list_dir`) we could even ignore the possibility of a non-existing file
   
   1. call `stat` explicitly (current)
   2. use 3 calls (exists, file_size, last_write_time)
   3. use 2 calls (file_size, last_write_time) because we don't care about non-existing files because a) this method is called with existing files b) we will handle them anyway downstream when we actually read it
   4. something else (make file_size, last_write_time return optionals, etc )




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org