You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Prateek Maheshwari (JIRA)" <ji...@apache.org> on 2019/03/19 17:42:00 UTC

[jira] [Resolved] (SAMZA-1832) Race condition between SamzaContainerListener.onContainerFailure(t) and StreamProcessor.stop()

     [ https://issues.apache.org/jira/browse/SAMZA-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prateek Maheshwari resolved SAMZA-1832.
---------------------------------------
    Resolution: Fixed

> Race condition between SamzaContainerListener.onContainerFailure(t) and StreamProcessor.stop()
> ----------------------------------------------------------------------------------------------
>
>                 Key: SAMZA-1832
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1832
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Yi Pan (Data Infrastructure)
>            Assignee: Bharath Kumarasubramanian
>            Priority: Major
>             Fix For: 1.0
>
>
> There is a race condition between SamzaContainerListener.onContainerFailure(t) and StreamProcessor.stop() that may mistakenly return a successful stopping state even there is an exception happened in the SamzaContainer.
> The sequence of events are:
> Thread-1: SamzaContainer.run() -> exception happened -> status set to FAILED -> start shutdown sequence
> Thread-2: User called LocalApplicationRunner.kill() -> StreamProcessor.stop() -> stopSamzaContainer(): since container status is failed, skipped waiting -> jobCoordinator.stop() -> since callback SamzaContainerListener.onContainerFailure() is only called after the shutdown sequence, containerException is not set -> normal shutdown of StreamProcessor -> appStatus = SuccessfulFinish
> The issue here is when StreamProcessor.stop() calls stopSamzaContainer(), it needs to wait till the callback of SamzaContainerListener.onContainerFailure(t) finishes before making the decision that the container is stopped successfully/failed.
> Reproduce steps:
> - Write an StreamApplication that throws exception in process()
> - Using StreamApplicationIntegrationTestHarness to start the application by runApplication()
> - In the test, immediately call runner.kill(); runner.waitForFinish(); runner.status()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)