You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Shanthoosh Venkataraman (JIRA)" <ji...@apache.org> on 2018/05/08 02:49:00 UTC

[jira] [Closed] (SAMZA-1695) Clear events in debounce queue on session expiration

     [ https://issues.apache.org/jira/browse/SAMZA-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shanthoosh Venkataraman closed SAMZA-1695.
------------------------------------------

> Clear events in debounce queue on session expiration
> ----------------------------------------------------
>
>                 Key: SAMZA-1695
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1695
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Shanthoosh Venkataraman
>            Priority: Major
>             Fix For: 0.15.0
>
>
> *Scenario:*
> Let's assume there're three processors in the group [P1, P2, P3] and P1 is the leader.
> 1. Leader processor(P1) loses connectivity with a zookeeper server in the ensemble and it's ephemeral processor node is deleted(due to session expiration).
>  2. Immediate successor(P2) to the leader(P1) finds out that the leader is dead and declares itself as leader. Processor P2 Schedules onProcessorChange to publish JobModel.
>  3. ZkClient connection retry logic helps the Leader(P1) to reconnect to another zkServer in the ensemble and it joins as follower.
>  4. Processor P1 acts on the stale buffered event in the debounce queue(which it received when it's a leader) and acts as leader. At this point, there're two processors acting as leader(P1 & P2). If P1 proceeds to execute leader actions before P2, P2 will fail(and in worst case can cause state corruption).
> *Sample exception logs:*
> [https://gist.github.com/shanthoosh/55410fe4ebf3cfb65281b35f16397cad]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)