You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2016/11/10 20:11:58 UTC

[jira] [Created] (BEAM-958) desiredNumWorkers in Dataflow is too low

Raghu Angadi created BEAM-958:
---------------------------------

             Summary: desiredNumWorkers in Dataflow is too low
                 Key: BEAM-958
                 URL: https://issues.apache.org/jira/browse/BEAM-958
             Project: Beam
          Issue Type: Improvement
          Components: runner-dataflow
    Affects Versions: 0.3.0-incubating
            Reporter: Raghu Angadi
            Assignee: Davor Bonaci


{{desiredNumWorkers}} in [UnboundedSource API|https://github.com/apache/incubator-beam/blob/v0.3.0-incubating-RC1/sdks/java/core/src/main/java/org/apache/beam/sdk/io/UnboundedSource.java#L69] is a suggestion to a source about how many splits it should create. KafkaIO currently takes this literally and only creates up to this many splits.

The main draw back is that it is very low in Dataflow. It is calculated as 
  * {{1 * maxNumWorkers}} if {{--maxNumWorkers}} is specified, otherwise
  * {{3 * numWorkers}}.

That implies there is only single reader per worker (which is usually a 4 core VM). That can leave CPU under utilized on many pipelines.
Even 3x in case of fixes number of workers seems low to me. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)