You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2016/11/10 20:11:58 UTC
[jira] [Created] (BEAM-958) desiredNumWorkers in Dataflow is too
low
Raghu Angadi created BEAM-958:
---------------------------------
Summary: desiredNumWorkers in Dataflow is too low
Key: BEAM-958
URL: https://issues.apache.org/jira/browse/BEAM-958
Project: Beam
Issue Type: Improvement
Components: runner-dataflow
Affects Versions: 0.3.0-incubating
Reporter: Raghu Angadi
Assignee: Davor Bonaci
{{desiredNumWorkers}} in [UnboundedSource API|https://github.com/apache/incubator-beam/blob/v0.3.0-incubating-RC1/sdks/java/core/src/main/java/org/apache/beam/sdk/io/UnboundedSource.java#L69] is a suggestion to a source about how many splits it should create. KafkaIO currently takes this literally and only creates up to this many splits.
The main draw back is that it is very low in Dataflow. It is calculated as
* {{1 * maxNumWorkers}} if {{--maxNumWorkers}} is specified, otherwise
* {{3 * numWorkers}}.
That implies there is only single reader per worker (which is usually a 4 core VM). That can leave CPU under utilized on many pipelines.
Even 3x in case of fixes number of workers seems low to me.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)