You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Lakshmi Manasa Gaduputi (Jira)" <ji...@apache.org> on 2021/09/15 23:21:00 UTC

[jira] [Updated] (SAMZA-2687) Elasticity: scale up task count beyond the input partition count.

     [ https://issues.apache.org/jira/browse/SAMZA-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lakshmi Manasa Gaduputi updated SAMZA-2687:
-------------------------------------------
    Description: 
Problem: Throughput via parallelism is tied to the number of tasks which is equal to the partition count of input streams. If a job is facing lag and is already at the max container count = number of tasks = number of input partitions, then the only choice it is left with is to repartition the input. The need for this arises due to the process-time of the job’s logic which is not under Samza’s control.

Solution: Proposed approach is to allow consumption of a portion of the partition (SystemStreamPartition) by an elastic task. Elastic task is the same as a task except that it consumes sub-ssp. 

SEP to follow shortly

 

  was:
Problem: Throughput via parallelism is tied to the number of tasks which is equal to the partition count of input streams. If a job is facing lag and is already at the max container count = number of tasks = number of input partitions, then the only choice it is left with is to repartition the input. The need for this arises due to the process-time of the job’s logic which is not under Samza’s control.

Solution: Proposed approach is to allow consumption of a portion of the partition (SystemStreamPartition) by a virtual task. Virtual task is the same as a task except that it consumes sub-ssp. 

SEP to follow shortly

 


> Elasticity: scale up task count beyond the input partition count.
> -----------------------------------------------------------------
>
>                 Key: SAMZA-2687
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2687
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Lakshmi Manasa Gaduputi
>            Assignee: Lakshmi Manasa Gaduputi
>            Priority: Major
>
> Problem: Throughput via parallelism is tied to the number of tasks which is equal to the partition count of input streams. If a job is facing lag and is already at the max container count = number of tasks = number of input partitions, then the only choice it is left with is to repartition the input. The need for this arises due to the process-time of the job’s logic which is not under Samza’s control.
> Solution: Proposed approach is to allow consumption of a portion of the partition (SystemStreamPartition) by an elastic task. Elastic task is the same as a task except that it consumes sub-ssp. 
> SEP to follow shortly
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)