You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Felipe Santos <fe...@gmail.com> on 2016/12/27 17:17:30 UTC

Connectivity problem with controller breaks cluster

Hi,

We are using kafka 0.10.1.0.

We have three brokers and three zookeeper.

Today broker 1 and 2 lost connectivity with broker 3, and I saw the broker
3 was the controller.
I saw lot of messages
"[rw_campaign_broadcast_nextel_734fae3d46d4da63ee36d2b6fd25a77f3f7c3ef5,9]
on broker 3: Shrinking ISR for partition
[rw_campaign_broadcast_nextel_734fae3d46d4da63ee36d2b6fd25a77f3f7c3ef5,9]
from 1,2,3 to 3"

On the broker 2 and 1:

[2016-12-27 08:10:05,501] WARN [ReplicaFetcherThread-0-3], Error in fetch
kafka.server.ReplicaFetcherThread$FetchRequest@108fd1b0
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 3 was disconnected before the response
was read
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
        at scala.Option.foreach(Option.scala:257)
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:112)
        at
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:108)
        at
kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:137)
        at
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scala:143)
        at
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
        at
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:253)
        at
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
        at
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
        at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
        at
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

All my consumers and producers went down.
I try to consume and produce with kafka-console-producer/consumer.sh and it
fails.

The only solution was restart broker 3, after that it correct the problem.

Any tips?
-- 
Felipe Santos

Re: Connectivity problem with controller breaks cluster

Posted by Apurva Mehta <ap...@confluent.io>.
Looks like you are hitting: https://issues.apache.org/jira/browse/KAFKA-4477

You can try upgrading to 0.10.1.1 and see if the issue recurs (a bunch of
deadlock bugs were fixed which might explain this issue). Or you can try to
provide the data described in
https://issues.apache.org/jira/browse/KAFKA-4477?focusedCommentId=15749722&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15749722
so that we can diagnose the problem.

As it stands, this seems to be a bug introduced in 0.10.1.0. We don't have
enough information to identify the root cause. If you can provide the trace
logging requested on that ticket, it would help.

Thanks,
Apurva

On Tue, Dec 27, 2016 at 9:17 AM, Felipe Santos <fe...@gmail.com> wrote:

> Hi,
>
> We are using kafka 0.10.1.0.
>
> We have three brokers and three zookeeper.
>
> Today broker 1 and 2 lost connectivity with broker 3, and I saw the broker
> 3 was the controller.
> I saw lot of messages
> "[rw_campaign_broadcast_nextel_734fae3d46d4da63ee36d2b6fd25a77f3f7c3ef5,9]
> on broker 3: Shrinking ISR for partition
> [rw_campaign_broadcast_nextel_734fae3d46d4da63ee36d2b6fd25a77f3f7c3ef5,9]
> from 1,2,3 to 3"
>
> On the broker 2 and 1:
>
> [2016-12-27 08:10:05,501] WARN [ReplicaFetcherThread-0-3], Error in fetch
> kafka.server.ReplicaFetcherThread$FetchRequest@108fd1b0
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 3 was disconnected before the response
> was read
>         at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
>         at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
>         at scala.Option.foreach(Option.scala:257)
>         at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:112)
>         at
> kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:108)
>         at
> kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(
> NetworkClientBlockingOps.scala:137)
>         at
> kafka.utils.NetworkClientBlockingOps$.kafka$utils$
> NetworkClientBlockingOps$$pollContinuously$extension(
> NetworkClientBlockingOps.scala:143)
>         at
> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(
> NetworkClientBlockingOps.scala:108)
>         at
> kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:
> 253)
>         at
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
>         at
> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
>         at
> kafka.server.AbstractFetcherThread.processFetchRequest(
> AbstractFetcherThread.scala:118)
>         at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>
> All my consumers and producers went down.
> I try to consume and produce with kafka-console-producer/consumer.sh and
> it
> fails.
>
> The only solution was restart broker 3, after that it correct the problem.
>
> Any tips?
> --
> Felipe Santos
>