You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "EpsilonPrime (via GitHub)" <gi...@apache.org> on 2023/03/22 01:13:38 UTC

[GitHub] [arrow] EpsilonPrime opened a new pull request, #34671: GH-15233: [C++] Correct thread contention in s3fs

EpsilonPrime opened a new pull request, #34671:
URL: https://github.com/apache/arrow/pull/34671

   Avoids thread contention which can occur when using CopyFiles to copy files from a filesystem other than S3 to the S3FS.  This is accomplished by not trying to complete the merging of a potentially multipart file asynchronously while already in an I/O thread (instead that work is completed in the already active thread).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34671: GH-15233: [C++] Correct thread contention in s3fs

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34671:
URL: https://github.com/apache/arrow/pull/34671#issuecomment-1478788294

   * Closes: #15233


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34671: GH-15233: [C++] Correct thread contention in s3fs

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34671:
URL: https://github.com/apache/arrow/pull/34671#issuecomment-1478788320

   :warning: GitHub issue #15233 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #34671: GH-15233: [C++] Correct thread contention in s3fs

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on PR #34671:
URL: https://github.com/apache/arrow/pull/34671#issuecomment-1480160193

   I think `CloseAsync` is expected to act as an async method.  By pushing the writes to the foreground it is no longer acting as an async method.
   
   This is similar to the issues we ran into on the read path.  The solution was to make everything async by default and the only time we wrap async with sync is at the API layer (e.g. on that first call from the user to a sync method).  This allows us to keep one async path for all the "internals" and only mirror the two paths at the edges.
   
   Instead, could we change `CopyFile` (and `CopyFiles`) to use async methods?  Or create `CopyFileAysnc` and `CopyFilesAsync` (and change `CopyFiles` to return `CopyFilesAsync.result()`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on pull request #34671: GH-15233: [C++] Correct thread contention in s3fs

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #34671:
URL: https://github.com/apache/arrow/pull/34671#issuecomment-1516903775

   @EpsilonPrime Just checking back in on this, are you still working on this and we should keep it open? Do you need any assistance in getting it working?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amol- commented on pull request #34671: GH-15233: [C++] Correct thread contention in s3fs

Posted by "amol- (via GitHub)" <gi...@apache.org>.
amol- commented on PR #34671:
URL: https://github.com/apache/arrow/pull/34671#issuecomment-1497231960

   @EpsilonPrime are you still interested in moving this forward? It looks like there are some failing checks and some review feedbacks to address


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] EpsilonPrime commented on a diff in pull request #34671: GH-15233: [C++] Correct thread contention in s3fs

Posted by "EpsilonPrime (via GitHub)" <gi...@apache.org>.
EpsilonPrime commented on code in PR #34671:
URL: https://github.com/apache/arrow/pull/34671#discussion_r1144246166


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -1244,12 +1244,12 @@ class ObjectOutputStream final : public io::OutputStream {
 
     if (current_part_) {
       // Upload last part
-      RETURN_NOT_OK(CommitCurrentPart());
+      RETURN_NOT_OK(CommitCurrentPart(true));

Review Comment:
   Sure, I've gotten away from the practice of adding the extra comments as my IDE does this for me, but done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] EpsilonPrime commented on pull request #34671: GH-15233: [C++] Correct thread contention in s3fs

Posted by "EpsilonPrime (via GitHub)" <gi...@apache.org>.
EpsilonPrime commented on PR #34671:
URL: https://github.com/apache/arrow/pull/34671#issuecomment-1497879670

   > @EpsilonPrime are you still interested in moving this forward? It looks like there are some failing checks and some review feedbacks to address
   
   @amol- Yes, I'm still working on this but my alternate version of the solution (using futures throughout) is not yet working.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cyb70289 commented on a diff in pull request #34671: GH-15233: [C++] Correct thread contention in s3fs

Posted by "cyb70289 (via GitHub)" <gi...@apache.org>.
cyb70289 commented on code in PR #34671:
URL: https://github.com/apache/arrow/pull/34671#discussion_r1144191277


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -1244,12 +1244,12 @@ class ObjectOutputStream final : public io::OutputStream {
 
     if (current_part_) {
       // Upload last part
-      RETURN_NOT_OK(CommitCurrentPart());
+      RETURN_NOT_OK(CommitCurrentPart(true));

Review Comment:
   Add a comment to make it easier to reason the boolean argument at caller side?
   `CommitCurrentPart(/*always_foreground_writes=*/true)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org