You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Bharath Kumarasubramanian (Jira)" <ji...@apache.org> on 2022/02/10 01:42:00 UTC

[jira] [Commented] (SAMZA-2721) Container should exit with non-zero status code in case of errors during launch

    [ https://issues.apache.org/jira/browse/SAMZA-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489906#comment-17489906 ] 

Bharath Kumarasubramanian commented on SAMZA-2721:
--------------------------------------------------

Here is the sample case where the container hit the code path where it swallows the exception and logs
{code:java}
2022-02-03 00:15:53.624 [crm-sync-entity-resolution-pipeline-i001-auditor] KafkaProducer [INFO] [Producer clientId=crm-sync-entity-resolution-pipeline-i001-auditor] Closing the Kafka producer with timeoutMillis = 9223370393007422183 ms.
2022-02-03 00:15:53.625 [main] LiKafkaConsumerImpl [INFO] Shutdown complete in 11 millis
2022-02-03 00:15:53.626 [main] ContainerLaunchUtil [ERROR] Container stopped with Exception. 
2022-02-03 00:15:53.626 [main] CoordinatorStreamStore [INFO] Stopping the coordinator stream system consumer.
2022-02-03 00:15:53.626 [main] LiKafkaConsumerImpl [INFO] Shutting down ...
2022-02-03 00:15:53.627 [main] AbstractAuditor [INFO] Closing auditor with timeout 9223372036854775807 MILLI {code}
 

https://github.com/apache/samza/blob/2e7c2fe6c095d05b1b75fc0c2768ad0e9a81a085/samza-core/src/main/java/org/apache/samza/runtime/ContainerLaunchUtil.java#L185

> Container should exit with non-zero status code in case of errors during launch
> -------------------------------------------------------------------------------
>
>                 Key: SAMZA-2721
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2721
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Bharath Kumarasubramanian
>            Priority: Major
>
> {*}Problem{*}:
> ContainerLaunchUtil during its launch sequence swallows exception and proceeds to shutdown with 0 status code. This causes AM to not restart the container.
> {*}Description{*}:
> With the run method, as part of launch sequence we have various initialization steps before kicking off the container. In case of exceptions during this step, the run method catches all erros but only logs them and proceeds to shutdown as usual. 
> Due to normal exit, AM treats the container completed successfully and hence doesn't restart causing the failed container to remain failed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)