You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Nicolae Marasoiu (JIRA)" <ji...@apache.org> on 2014/09/17 15:01:33 UTC

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

    [ https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137181#comment-14137181 ] 

Nicolae Marasoiu commented on KAFKA-1461:
-----------------------------------------

Hi,

So I guess in this block:
try {
      trace("Issuing to broker %d of fetch request %s".format(sourceBroker.id, fetchRequest))
      response = simpleConsumer.fetch(fetchRequest)
    } catch {
      case t: Throwable =>
        if (isRunning.get) {
          warn("Error in fetch %s. Possible cause: %s".format(fetchRequest, t.toString))
          partitionMapLock synchronized {
            partitionsWithError ++= partitionMap.keys
          }
        }
    }
I should add a case for the specific scenario of connection timeout/refused/reset and introduce a backoff on that path?

> Replica fetcher thread does not implement any back-off behavior
> ---------------------------------------------------------------
>
>                 Key: KAFKA-1461
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1461
>             Project: Kafka
>          Issue Type: Improvement
>          Components: replication
>    Affects Versions: 0.8.1.1
>            Reporter: Sam Meder
>            Assignee: nicu marasoiu
>              Labels: newbie++
>
> The current replica fetcher thread will retry in a tight loop if any error occurs during the fetch call. For example, we've seen cases where the fetch continuously throws a connection refused exception leading to several replica fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, although the fact that erroring partitions are removed so a leader can be re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)