You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Nicolae Marasoiu (JIRA)" <ji...@apache.org> on 2014/09/17 15:01:33 UTC
[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not
implement any back-off behavior
[ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137181#comment-14137181 ]
Nicolae Marasoiu commented on KAFKA-1461:
-----------------------------------------
Hi,
So I guess in this block:
try {
trace("Issuing to broker %d of fetch request %s".format(sourceBroker.id, fetchRequest))
response = simpleConsumer.fetch(fetchRequest)
} catch {
case t: Throwable =>
if (isRunning.get) {
warn("Error in fetch %s. Possible cause: %s".format(fetchRequest, t.toString))
partitionMapLock synchronized {
partitionsWithError ++= partitionMap.keys
}
}
}
I should add a case for the specific scenario of connection timeout/refused/reset and introduce a backoff on that path?
> Replica fetcher thread does not implement any back-off behavior
> ---------------------------------------------------------------
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
> Issue Type: Improvement
> Components: replication
> Affects Versions: 0.8.1.1
> Reporter: Sam Meder
> Assignee: nicu marasoiu
> Labels: newbie++
>
> The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)