You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Navina Ramesh (JIRA)" <ji...@apache.org> on 2017/03/02 19:45:45 UTC
[jira] [Commented] (SAMZA-1113) Implement startup and shutdown sequence of jobs in ZK environment

    [ https://issues.apache.org/jira/browse/SAMZA-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892862#comment-15892862 ] 

Navina Ramesh commented on SAMZA-1113:
--------------------------------------

Current Design (as proposed in the design document in [SAMZA-1064|https://issues.apache.org/jira/browse/SAMZA-1064]):
* Processors startup and join the default processing group under /jobname-jobid/processors
* When processors leave, tasks are shuffled to remaining processors
* The job is expected to be Alive or Active if it has at least 1 processor in the processing group.
* The job is assumed to have Shutdown if there are no more processors in the processing group.
* Assumes that we do not support rolling bounce 

Current design is simple. However, it falls short for the following reason:
# If there are no processors remaining in the processing group, we don't know if the job had a graceful shutdown or that all processors failed abruptly. There is no feedback on the "status" of the job. 
# It is not clear what should happen if we want to restart or upgrade an existing job because there is only one "attempt" associated with the job. For example, in a processing group of size 10, if 5 processors are restarted, which "attempt" should they join. This is critical to clearly define the lifecycle of the job itself and how to maintain/upgrade it over time. 
# This directly impacts the abstraction layer above (DAG handler - See SAMZA-1041) - ApplicationRunner/ExecutionEnvironment as it directly manages the individual stages of the job. 

Requirements:
* Associate an attempt number for a particular job's scope
* For each attempt, we should be able to infer the state of the job (whether it is ACTIVE, SHUTDOWN or FAILED)
* Trigger for a graceful shutdown can come from an external entity or from the job itself (batch jobs)
 


> Implement startup and shutdown sequence of jobs in ZK environment
> -----------------------------------------------------------------
>
>                 Key: SAMZA-1113
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1113
>             Project: Samza
>          Issue Type: Sub-task
>            Reporter: Navina Ramesh
>            Assignee: Navina Ramesh
>             Fix For: 0.13.0
>
>
> Problem that we need to solve is: Do we need multiple job attempts in the ZK tree? If yes, who creates the persistent subtrees? There is no leader until the ZK trees are setup.
> In the initial prototype, the first processor instance creates the ZK hierarchy. If we were to support multiple job attempts, then we need different ZK trees for each attempt. How do all the processors within a job know which the attempt ID to join?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)