You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/16 21:56:54 UTC

[GitHub] [arrow] westonpace commented on a change in pull request #11977: ARROW-14930: [C++] Make S3 directory detection more robust

westonpace commented on a change in pull request #11977:
URL: https://github.com/apache/arrow/pull/11977#discussion_r770943221



##########
File path: cpp/src/arrow/filesystem/s3fs.cc
##########
@@ -1732,44 +1752,43 @@ class S3FileSystem::Impl : public std::enable_shared_from_this<S3FileSystem::Imp
 
     auto outcome = client_->HeadObject(req);
     if (outcome.IsSuccess()) {
-      *out = true;
-      return Status::OK();
+      return true;
     }
     if (!backend_) {
       SaveBackend(outcome.GetError());

Review comment:
       Is `SaveBackend` an expensive operation?  If we move this check up above `if (backend_ && *backend_ == S3Backend::Minio)` it seems we could avoid potentially recursing by always getting it right the first time.  But I might be missing something.

##########
File path: cpp/src/arrow/filesystem/s3fs.cc
##########
@@ -1720,7 +1720,27 @@ class S3FileSystem::Impl : public std::enable_shared_from_this<S3FileSystem::Imp
   // a non-empty "directory".  This is a Minio-specific quirk, but we need
   // to handle it for unit testing.
 
-  Status IsEmptyDirectory(const std::string& bucket, const std::string& key, bool* out) {
+  // If this method is called after HEAD on "bucket/key" already returned a 404,
+  // can pass the given outcome to spare a spurious HEAD call.
+  Result<bool> IsEmptyDirectory(
+      const std::string& bucket, const std::string& key,
+      const S3Model::HeadObjectOutcome* previous_outcome = nullptr) {
+    if (previous_outcome) {
+      // Fetch the backend from the previous error
+      DCHECK(!previous_outcome->IsSuccess());
+      if (!backend_) {
+        SaveBackend(previous_outcome->GetError());
+        DCHECK(backend_);
+      }
+      if (backend_ != S3Backend::Minio) {
+        // HEAD already returned a 404, nothing more to do
+        return false;
+      }
+    }
+
+    // We come here in one of two situations:
+    // - we don't know the backend and there is no previous outcome

Review comment:
       ```suggestion
       // - there is no previous outcome
   ```

##########
File path: cpp/src/arrow/filesystem/s3fs.cc
##########
@@ -1720,7 +1720,27 @@ class S3FileSystem::Impl : public std::enable_shared_from_this<S3FileSystem::Imp
   // a non-empty "directory".  This is a Minio-specific quirk, but we need
   // to handle it for unit testing.
 
-  Status IsEmptyDirectory(const std::string& bucket, const std::string& key, bool* out) {
+  // If this method is called after HEAD on "bucket/key" already returned a 404,
+  // can pass the given outcome to spare a spurious HEAD call.
+  Result<bool> IsEmptyDirectory(
+      const std::string& bucket, const std::string& key,
+      const S3Model::HeadObjectOutcome* previous_outcome = nullptr) {

Review comment:
       I don't know if this is one we enforce or not but the style guide prefers `util::optional<S3Model::HeadObjectOutcome>` (well, technically std::optional) instead of `const *`
   
   https://google.github.io/styleguide/cppguide.html#Inputs_and_Outputs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org