You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Micah Wylde (JIRA)" <ji...@apache.org> on 2018/10/11 21:55:00 UTC

[jira] [Created] (BEAM-5724) Beam creates too many sdk_worker processes with --sdk-worker-parallelism=stage

Micah Wylde created BEAM-5724:
---------------------------------

             Summary: Beam creates too many sdk_worker processes with --sdk-worker-parallelism=stage
                 Key: BEAM-5724
                 URL: https://issues.apache.org/jira/browse/BEAM-5724
             Project: Beam
          Issue Type: Improvement
          Components: runner-flink
            Reporter: Micah Wylde


In the flink portable runner, we currently support two options for sdk worker parallelism (how many python worker processes we run). The default is one per taskmanager, and with --sdk-worker-parallelism=stage you get one per stage. However, for complex pipelines with many beam operators that get fused into a single flink task this can produce hundreds of worker processes per TM.

Flink uses the notion of task slots to limit resource utilization on a box; I think that beam should try to respect those limits as well. I think ideally we'd produce a single python worker per task slot/flink operator chain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)