You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "EpsilonPrime (via GitHub)" <gi...@apache.org> on 2023/03/21 07:33:45 UTC

[GitHub] [arrow] EpsilonPrime commented on issue #15233: [Python] pyarrow.fs.copy_files hangs indefinitely

EpsilonPrime commented on issue #15233:
URL: https://github.com/apache/arrow/issues/15233#issuecomment-1477386944

   I have written a reproduction testcase that detects the thread contention issue (and is ready to check in once the fix is ready).  What is happening is that when copying a file (filesystem.cc:613) the CopyStream happens as expected and then is passed to the close routine to complete.  That delegates to CloseAsync which handles uploading parts (calling UploadPart).  To do this UploadPart then adds its work to the threadpool which overloads the executor.  For the case of an 8 thread pool with 8 tasks (each small enough to fit in a single part) this ends up being 16 busy threads in a size 8 executor.
   
   The easy solution is to limit the number of tasks to the pool (merely leaving one extra thread appears to be enough for the pool to empty although this needs verification).  The second is to modify the close routine to take over the work of the existing thread (not be asynchronous).  This would require reworking of at least 5 functions and might require even more work for the case where there are multiple parts per file (which we do not have a test for yet).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org