You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "mapleFU (via GitHub)" <gi...@apache.org> on 2023/09/12 03:42:58 UTC

[GitHub] [arrow] mapleFU opened a new issue, #37670: [C++] FS: Would S3 OutputStream AsyncClose meets lifetime issue?

mapleFU opened a new issue, #37670:
URL: https://github.com/apache/arrow/issues/37670

   ### Describe the enhancement requested
   
   During `CloseAsync`:
   
   ```c++
     Future<> CloseAsync() override {
       if (closed_) return Status::OK();
   
       if (current_part_) {
         // Upload last part
         RETURN_NOT_OK(CommitCurrentPart());
       }
   
       // S3 mandates at least one part, upload an empty one if necessary
       if (part_number_ == 1) {
         RETURN_NOT_OK(UploadPart("", 0));
       }
   
       // Wait for in-progress uploads to finish (if async writes are enabled)
       return FlushAsync().Then([this]() {
         ARROW_ASSIGN_OR_RAISE(auto client_lock, holder_->Lock());
   
         // At this point, all part uploads have finished successfully
         DCHECK_GT(part_number_, 1);
         DCHECK_EQ(upload_state_->completed_parts.size(),
                   static_cast<size_t>(part_number_ - 1));
   
         S3Model::CompletedMultipartUpload completed_upload;
         completed_upload.SetParts(upload_state_->completed_parts);
         S3Model::CompleteMultipartUploadRequest req;
         req.SetBucket(ToAwsString(path_.bucket));
         req.SetKey(ToAwsString(path_.key));
         req.SetUploadId(upload_id_);
         req.SetMultipartUpload(std::move(completed_upload));
   
         auto outcome =
             client_lock.Move()->CompleteMultipartUploadWithErrorFixup(std::move(req));
         if (!outcome.IsSuccess()) {
           return ErrorToStatus(
               std::forward_as_tuple("When completing multiple part upload for key '",
                                     path_.key, "' in bucket '", path_.bucket, "': "),
               "CompleteMultipartUpload", outcome.GetError());
         }
   
         holder_ = nullptr;
         closed_ = true;
         return Status::OK();
       });
     }
   ```
   
   Notice that in `FlushAsync().Then`, the lambda capture `this`. When:
   1. user call `FlushAsync()`
   2. destruct `OutputStream` without waiting for future
   3. Call `Then([this]()`...
   
   Would this raise lifetime problem? Or here is some lifetime constraint I didn't catch?
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #37670: [C++] FS: Would S3 OutputStream CloseAsync meets lifetime issue?

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #37670:
URL: https://github.com/apache/arrow/issues/37670#issuecomment-1717403692

   @pitrou Should I just wrap `s3fs`'s `ObjectOutputStream` with `enable_shared_from_this`, or wrap the `arrow::io::OutputStream` with `enable_shared_from_this`? Wrap the `ObjectOutputStream` looks good to me. But I wonder would `arrow::io::OutputStream` harm?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #37670: [C++] FS: Would S3 OutputStream CloseAsync meets lifetime issue?

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #37670:
URL: https://github.com/apache/arrow/issues/37670#issuecomment-1716945963

   This looks ok to me, but would it break the user who is using `std::unique_ptr<arrow::io::OutputStream>` ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #37670: [C++] FS: Would S3 OutputStream CloseAsync meets lifetime issue?

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #37670:
URL: https://github.com/apache/arrow/issues/37670#issuecomment-1717392497

   > This looks ok to me, but would it break the user who is using `std::unique_ptr<arrow::io::OutputStream>` ?
   
   In which context would they get a `unique_ptr`? The public APIs return a `shared_ptr`:
   https://github.com/apache/arrow/blob/4fac528e2ee9c4efce453420275cbaf0b3b6adb0/cpp/src/arrow/filesystem/s3fs.h#L299-L311
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #37670: [C++] FS: Would S3 OutputStream CloseAsync meets lifetime issue?

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #37670:
URL: https://github.com/apache/arrow/issues/37670#issuecomment-1717591894

   We should wrap `arrow::io::OutputStream`, since that's probably going to be useful for other filesystems.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on issue #37670: [C++] FS: Would S3 OutputStream CloseAsync meets lifetime issue?

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on issue #37670:
URL: https://github.com/apache/arrow/issues/37670#issuecomment-1716071486

   @pitrou @lidavidm since I'm not so familiar with s3fs, would you mind take a look? Would this be a problem? Or here is some lifetime constraint I didn't catch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #37670: [C++] FS: Would S3 OutputStream CloseAsync meets lifetime issue?

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #37670:
URL: https://github.com/apache/arrow/issues/37670#issuecomment-1716157285

   Hmm... perhaps `OutputStream` should inherit from `std::enable_shared_from_this` like `InputStream` does, so that we can capture an owning pointer in the closure?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz closed issue #37670: [C++] FS: Would S3 OutputStream CloseAsync meets lifetime issue?

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz closed issue #37670: [C++] FS: Would S3 OutputStream CloseAsync meets lifetime issue?
URL: https://github.com/apache/arrow/issues/37670


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org