You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Yi Pan (Data Infrastructure) (JIRA)" <ji...@apache.org> on 2016/12/22 23:36:58 UTC

[jira] [Updated] (SAMZA-1069) Deadlock between KafkaSystemProducer and KafkaProducer from kafka-clients lib

     [ https://issues.apache.org/jira/browse/SAMZA-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yi Pan (Data Infrastructure) updated SAMZA-1069:
------------------------------------------------
    Fix Version/s: 0.12.0

> Deadlock between KafkaSystemProducer and KafkaProducer from kafka-clients lib
> -----------------------------------------------------------------------------
>
>                 Key: SAMZA-1069
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1069
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.11.0
>            Reporter: Yi Pan (Data Infrastructure)
>             Fix For: 0.12.0
>
>
> We have identified one deadlock scenario between the main thread that calls KafkaSystemProducer.close() vs the KafkaProducer client lib's network thread that calls the callback function within KafkaSystemProducer.send().
> The scenario is the following:
> # SamzaContainer main thread caught an exception from previous commit and container initiated shutdown, which calls {code}KafkaSystemProducer.stop(){code}, grabbing the synchronized {code}producerLock{code} in {code}KafkaSystemProducer{code} and call {code}KafkaProducer.flush(){code} to wait for all pending requests to be done.
> # {code}KafkaProducer{code} network I/O thread then calls KafkaSystemProducer’s callback function (in {code}RecordBatch.done(){code}), which is waiting on the same {code}producerLock{code} in {code}KafkaSystemProducer{code} before it can return and call {code}producerFuture.done(){code} and release the {code}CountDownLatch{code} that the main thread {code}KafkaSystemProducer.close(){code} is waiting on. Hence, deadlock!
> We need to make sure the KafkaSystemProducer.close() won't have race condition w/ the callbacks triggered by the KafkaProducer's network thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)