You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "krfricke (via GitHub)" <gi...@apache.org> on 2023/02/07 16:57:07 UTC

[GitHub] [arrow] krfricke commented on issue #15233: [Python] pyarrow.fs.copy_files hangs indefinitely

krfricke commented on issue #15233:
URL: https://github.com/apache/arrow/issues/15233#issuecomment-1421103441

   I think this issue is a duplicate of #32372. I've added more details in that issue, but in a nutshell, pyarrow.fs.copy_files hangs for s3 buckets and with `use_threads=True` if more files are uploaded than CPU cores available:
   
   ```
   mkdir -p /tmp/pa-s3
   cd /tmp/pa-s3 
   for i in {1..7}; do touch $i.txt; done
   # This works
   python -c "import pyarrow.fs; pyarrow.fs.copy_files('/tmp/pa-s3', 's3://bucket/folder')"
   for i in {1..8}; do touch $i.txt; done  
   # This hangs forever
   python -c "import pyarrow.fs; pyarrow.fs.copy_files('/tmp/pa-s3', 's3://bucket/folder')"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org