You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "vatanrathi (via GitHub)" <gi...@apache.org> on 2023/04/06 11:55:11 UTC

[GitHub] [beam] vatanrathi commented on pull request #26114: (AWS S3 FS) Fix performance issue of S3 filesystem when reading large files

vatanrathi commented on PR #26114:
URL: https://github.com/apache/beam/pull/26114#issuecomment-1498945441

   @aromanenko-dev I ran a few tests with changes made by @mosche on some large files ~10 to 100GBs and performance is good. As I mentioned earlier, on any version between 2.31.0 to 2.46.0, our pipeline runs for hours due to this issue of draining or closing inputStream which also seems to read before close. With this change, test pipeline on a 15GB file finished in 5mins as appose to 3+hrs on 2.46.0. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org