You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2015/02/26 23:11:04 UTC

[jira] [Commented] (SAMZA-579) KafkaSystemConsumer drops SSPs on failure

    [ https://issues.apache.org/jira/browse/SAMZA-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339279#comment-14339279 ] 

Chris Riccomini commented on SAMZA-579:
---------------------------------------

Right now, I just Thread.sleep(10000) in the KafkaSystemConsumer. This is a blocking operation, which will effectively stop *all* consuming from the BrokerProxy until a leader is available. A better solution might be to not block, but temporarily drop the SSP, and try again in a few seconds. This allows other non-abdicated SSPs to continue being consumed in the mean-time.

> KafkaSystemConsumer drops SSPs on failure
> -----------------------------------------
>
>                 Key: SAMZA-579
>                 URL: https://issues.apache.org/jira/browse/SAMZA-579
>             Project: Samza
>          Issue Type: Bug
>          Components: kafka
>    Affects Versions: 0.9.0
>            Reporter: Chris Riccomini
>             Fix For: 0.9.0
>
>         Attachments: SAMZA-579-0.patch
>
>
> While running SAMZA-394, I discovered a bug in KafkaSystemConsumer that causes it to stop consuming under failure scenarios. This does not cause data loss, but can wedge a container until it's restarted.
> The trigger appears to be when a BrokerProxy fetches from a broker that's still coming up, and hasn't yet claimed ownership for a TopicAndPartition. When the fetch fails, the BrokerProxy abdicate()s the TopicAndPartition, and KafkaSystemConsumer tries to refresh to get the leader. If there is no leader, the KafkaSystemConsumer drops the SSP. This happens in KafkaSystemConsumer.refreshBrokers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)