You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Micah Wylde (JIRA)" <ji...@apache.org> on 2018/10/11 21:55:00 UTC
[jira] [Created] (BEAM-5724) Beam creates too many sdk_worker
processes with --sdk-worker-parallelism=stage
Micah Wylde created BEAM-5724:
---------------------------------
Summary: Beam creates too many sdk_worker processes with --sdk-worker-parallelism=stage
Key: BEAM-5724
URL: https://issues.apache.org/jira/browse/BEAM-5724
Project: Beam
Issue Type: Improvement
Components: runner-flink
Reporter: Micah Wylde
In the flink portable runner, we currently support two options for sdk worker parallelism (how many python worker processes we run). The default is one per taskmanager, and with --sdk-worker-parallelism=stage you get one per stage. However, for complex pipelines with many beam operators that get fused into a single flink task this can produce hundreds of worker processes per TM.
Flink uses the notion of task slots to limit resource utilization on a box; I think that beam should try to respect those limits as well. I think ideally we'd produce a single python worker per task slot/flink operator chain.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)