You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/08/09 01:15:25 UTC

[GitHub] [beam] anishnag opened a new issue, #22632: [Feature Request]: Ability to allocate threads across various ParDo of pipeline

anishnag opened a new issue, #22632:
URL: https://github.com/apache/beam/issues/22632

   ### What would you like to happen?
   
   I'm currently using a streaming Apache Beam pipeline on a Dataflow Runner with an attached GPU to perform real-time inference. We ingest Pub/Sub messages that contain the GCS path of a datafile, which we then proceed to download and pre-process before batching and dispatching to the GPU for inference.
   
   The issue is that the earlier preprocessing stages are I/O bound and would benefit from many harness threads, but the inference step would ideally only have one thread to prevent GPU memory oversubscription, despite using only one process.
   
   It would be very useful to be able to configure the maximum number of threads to allocate to the preprocess `ParDo` in an effort to properly assign threads to stages that need it the most. We'd also then just assign a single thread to the inference `ParDo` instead of choosing pipeline parameters empirically until they work in the majority of cases.
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: runner-dataflow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org