You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Yan Fang (JIRA)" <ji...@apache.org> on 2015/04/28 23:02:07 UTC

[jira] [Created] (SAMZA-662) Samza auto-creates changelog stream without sufficient partitions when container number > 1

Yan Fang created SAMZA-662:
------------------------------

             Summary: Samza auto-creates changelog stream without sufficient partitions when container number > 1
                 Key: SAMZA-662
                 URL: https://issues.apache.org/jira/browse/SAMZA-662
             Project: Samza
          Issue Type: Bug
          Components: container
    Affects Versions: 0.9.0
            Reporter: Yan Fang


We currently provide auto-create for changelog streams. However, when there are more than 1 containers, it is possible that Samza creates a changelog stream with insufficient partitions. 

Reason:
assume we have an input stream with 3 partitions and then we assign 3 containers for this job. According to the [JobCoordinator|https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala], we will get:
||Container(Model) || InputStream Partition || Changelog Partition ||
|0 | 0 | 0|
|1 | 1 | 1|
|2 | 2 | 2|

If Container 0 is brought up first, it calls 
{code}
    val maxChangeLogStreamPartitions = containerModel.getTasks.values
            .max(Ordering.by { task:TaskModel => task.getChangelogPartition.getPartitionId })
            .getChangelogPartition.getPartitionId + 1
{code}
in [SamzaContainer|https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala].

The maxChangeLogStreamPartition is 1. So we will auto-create a changelog stream with only 1 partitions.

Similarly, if the Container 2 is brought up first, we will get a stream with 2 partitions.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)