You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/06/05 03:47:04 UTC

[jira] [Commented] (SAMZA-1293) Enable partition expansion of input streams

    [ https://issues.apache.org/jira/browse/SAMZA-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036494#comment-16036494 ] 

ASF GitHub Bot commented on SAMZA-1293:
---------------------------------------

GitHub user lindong28 opened a pull request:

    https://github.com/apache/samza/pull/214

    SAMZA-1293: Enable partition expansion of input streams (SEP-5)

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lindong28/samza SAMZA-1293

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/samza/pull/214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #214
    
----
commit 789ca596a9fe50e8a784582e3a0c33a130228142
Author: Dong Lin <li...@gmail.com>
Date:   2017-06-04T20:04:32Z

    SAMZA-1293: Enable partition expansion of input streams (SEP-5)

----


> Enable partition expansion of input streams
> -------------------------------------------
>
>                 Key: SAMZA-1293
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1293
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Dong Lin
>
> Right now, Samza does not allow partitions of the input stream to increase after a stateful job is created. This causes problem when Kafka is used as the input system, because we need to expand partitions of an existing topic as the byte-in-rate of the topic increases over time in order to limit the size of the maximum partition in Kafka. Kafka broker may have performance issue if the size of a given partition is too large.
> This patch provides a solution to increase partition number of the input streams of a stateful Samza job while still ensuring the correctness of Samze job output. The solution should work when Kafka is used as the input system. We expect this solution to work similarly with other input system as well. The motivation of increasing partition number of Kafka topic is 1) increase performance of Kafka broker and 2) increase throughput of Kafka consumer in the Samza container.
> See SEP-5 (https://cwiki.apache.org/confluence/display/SAMZA/SEP-5%3A+Enable+partition+expansion+of+input+streams) for the design and the interface change of this patch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)