You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Bharath Kumarasubramanian (Jira)" <ji...@apache.org> on 2021/09/07 17:25:00 UTC

[jira] [Commented] (SAMZA-2663) Update active job model to proposed job model on job model expiration

    [ https://issues.apache.org/jira/browse/SAMZA-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411374#comment-17411374 ] 

Bharath Kumarasubramanian commented on SAMZA-2663:
--------------------------------------------------

Potential Solutions
* Update the active job model during the flow of job model expiration; However, this is a tricky approach as there are other flows that need the old/active job model during onNewJobModel flows and hence require changes that are bit more involved. Additionally, since all the actions are executed on debounce timer, we need to account for event ordering (ZK notifications, rebalances, barrier timeouts)

* Track if the job model expiration happened and in onNewJobModel flow, trigger new job model flow as long as the job model expiration happened. The optimization introduced earlier for handling no change in work assignment is handled during expiration and should work as expected.

> Update active job model to proposed job model on job model expiration
> ---------------------------------------------------------------------
>
>                 Key: SAMZA-2663
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2663
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Bharath Kumarasubramanian
>            Assignee: Bharath Kumarasubramanian
>            Priority: Major
>
> *Problem*:
> As part of  SAMZA-2638, we introduced skipping container restart and stops on no changes to work assignment for processors across rebalances. However, we only update the active job model with the proposed job model on starting the container as part of `onNewJobModel`. This leads to a scenario where the processor is stopped but the future rebalances assume the container is still running. More information on scenario below.
> *Scenario*: 
> Imagine the quorum is in steady state with job model version v1. A new rebalance occurs and the leader generates v2. Processor P1 has changes in work assignment and as a result stops the container as part of job model expiration. However, in the event of the rebalance being unsuccessful (barrier times out), a new rebalance occurs which generates a job model version v3. In the scenario where work assignment for P1 in v3 is same as v1, then the state transition assumes the processor hasn't stopped the container and proceeds to do an no-op.
> *Changes*:
> Update the active job model regardless of whether we render the current job model obsolete or not for the current processor during checkAndExpireJobModel 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)