You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "iht (via GitHub)" <gi...@apache.org> on 2023/02/13 16:42:34 UTC

[GitHub] [beam] iht opened a new issue, #25447: AvroIO `ReadFiles.withDesiredBundleSizeBytes` should be public

iht opened a new issue, #25447:
URL: https://github.com/apache/beam/issues/25447

   The [method AvroIO `ReadFiles.withDesiredBundleSizeBytes` is marked as private](https://github.com/apache/beam/blob/bb5e200df70a875379ff5a6dfe72325172c3eb77/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java#L810-L813), which implies that users always have to use the default value of 64 MB.
   
   In streaming applications, this value can be too large and cause memory issues. For instance, with [the default number of threads in Dataflow streaming engine is for instance 300 per VM](https://cloud.google.com/dataflow/docs/guides/troubleshoot-oom#beam-javago-sdk), which implies that a VM would need >19 GB of memory (300x64 MB) to be able to read all the file bundles (assuming that there were at least 19 GB of data to be read).
   
   With that method, users can control the bundle size and reduce the amount of memory needed to use `ReadFiles` in streaming.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev closed issue #25447: [Task] AvroIO `ReadFiles.withDesiredBundleSizeBytes` should be public

Posted by "aromanenko-dev (via GitHub)" <gi...@apache.org>.
aromanenko-dev closed issue #25447: [Task] AvroIO `ReadFiles.withDesiredBundleSizeBytes` should be public
URL: https://github.com/apache/beam/issues/25447


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org