You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Chris Neal <cw...@gmail.com> on 2016/04/13 18:55:38 UTC

Help understanding a failure please.

Hi all.

I'm running a two node cluster that has been rock solid for almost a year
and a half.  We experienced an outage of one of the two brokers this
morning, and from the logs, I'm not sure what happened, and how to prevent
it.

The Kafka version is 0.8.1.1 with Scala 2.10.  Java version is Open JDK
version 1.8.0_65

Everything running fine, then:

[2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
when reading from channel, socket has likely been closed.
(kafka.consumer.SimpleConsumer)
[2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
when reading from channel, socket has likely been closed.
(kafka.consumer.SimpleConsumer)
[2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
when reading from channel, socket has likely been closed.
(kafka.consumer.SimpleConsumer)
[2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
when reading from channel, socket has likely been closed.
(kafka.consumer.SimpleConsumer)
[2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
when reading from channel, socket has likely been closed.
(kafka.consumer.SimpleConsumer)
[2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
when reading from channel, socket has likely been closed.
(kafka.consumer.SimpleConsumer)
[2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
when reading from channel, socket has likely been closed.
(kafka.consumer.SimpleConsumer)
[2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
when reading from channel, socket has likely been closed.
(kafka.consumer.SimpleConsumer)

[2016-04-13 11:01:28,352] ERROR [ReplicaFetcherThread-1-0], Error in fetch
Name: FetchRequest; Version: 0; CorrelationId: 9644043; ClientId:
ReplicaFetcherThread-1-0; ReplicaId: 1; MaxWait: 500 ms; MinBytes: 1 bytes;
RequestInfo:* [snip of every topic and partition on the broker listed here]*
java.net.ConnectException: Connection refused
        at sun.nio.ch.Net.connect0(Native Method)
        at sun.nio.ch.Net.connect(Net.java:454)
        at sun.nio.ch.Net.connect(Net.java:446)
        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
        at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
        at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:44)
        at kafka.consumer.SimpleConsumer.reconnect(SimpleConsumer.scala:57)
        at
kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:79)
        at
kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:109)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
        at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:108)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
        at
kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
        at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
        at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:107)
        at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
        at
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)

The logs then spam that ERROR and Exception 5406 times between:
2016-04-13 11:01:28,352 and 2016-04-13 11:01:31,994

Then I get this message twice:
[2016-04-13 11:01:31,997] INFO [ReplicaFetcherManager on broker 1] Removed
fetcher for partitions [snip list of all my topics and partitions listed]

Then this:
[2016-04-13 11:01:32,061] INFO [ReplicaFetcherThread-1-0], Shutting down
(kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,061] INFO [ReplicaFetcherThread-1-0], Shutting down
(kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,113] INFO New leader is 1
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2016-04-13 11:01:32,113] INFO New leader is 1
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Shutdown
completed (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Shutdown
completed (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Stopped
 (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Stopped
 (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-0-0], Shutting down
(kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-0-0], Shutting down
(kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Stopped
 (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Stopped
 (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Shutdown
completed (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Shutdown
completed (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-3-0], Shutting down
(kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-3-0], Shutting down
(kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Stopped
 (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Stopped
 (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Shutdown
completed (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Shutdown
completed (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-2-0], Shutting down
(kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-2-0], Shutting down
(kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Stopped
 (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Stopped
 (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Shutdown
completed (kafka.server.ReplicaFetcherThread)
[2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Shutdown
completed (kafka.server.ReplicaFetcherThread)


At this point, there are no more errors to the log file, but all the
consumers are still trying to consume from this broker, and are getting
Connection Refused exceptions.  It isn't until I cycled the broker that
things got back to normal.

Can anyone tell me what happened?  Or why consumers didn't recognize that
there was a problem with this broker and start consuming from the other one?

Can I provide any more details? :)

Thank you so much for your time!

Re: Help understanding a failure please.

Posted by Chris Neal <cw...@gmail.com>.
:)  Not lame.  Valid question!

Part of the problem is that the exception doesn't tell me where the
connection refused is coming from.  No IP address or hostname or
application name is part of the error, so I have no idea to which system
the problem is occurring!

I was able to ssh to the broker server, and the other broker in the cluster
was still able to communicate with the problematic one, so there was
definitely network connectivity at some level.

Chris

On Wed, Apr 13, 2016 at 7:38 PM, R Krishna <kr...@gmail.com> wrote:

> Sorry, if this sounds lame, but can you ping or telnet?
>
> On Wed, Apr 13, 2016 at 9:55 AM, Chris Neal <cw...@gmail.com> wrote:
>
> > Hi all.
> >
> > I'm running a two node cluster that has been rock solid for almost a year
> > and a half.  We experienced an outage of one of the two brokers this
> > morning, and from the logs, I'm not sure what happened, and how to
> prevent
> > it.
> >
> > The Kafka version is 0.8.1.1 with Scala 2.10.  Java version is Open JDK
> > version 1.8.0_65
> >
> > Everything running fine, then:
> >
> > [2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
> > when reading from channel, socket has likely been closed.
> > (kafka.consumer.SimpleConsumer)
> > [2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
> > when reading from channel, socket has likely been closed.
> > (kafka.consumer.SimpleConsumer)
> > [2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
> > when reading from channel, socket has likely been closed.
> > (kafka.consumer.SimpleConsumer)
> > [2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
> > when reading from channel, socket has likely been closed.
> > (kafka.consumer.SimpleConsumer)
> > [2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
> > when reading from channel, socket has likely been closed.
> > (kafka.consumer.SimpleConsumer)
> > [2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
> > when reading from channel, socket has likely been closed.
> > (kafka.consumer.SimpleConsumer)
> > [2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
> > when reading from channel, socket has likely been closed.
> > (kafka.consumer.SimpleConsumer)
> > [2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
> > when reading from channel, socket has likely been closed.
> > (kafka.consumer.SimpleConsumer)
> >
> > [2016-04-13 11:01:28,352] ERROR [ReplicaFetcherThread-1-0], Error in
> fetch
> > Name: FetchRequest; Version: 0; CorrelationId: 9644043; ClientId:
> > ReplicaFetcherThread-1-0; ReplicaId: 1; MaxWait: 500 ms; MinBytes: 1
> bytes;
> > RequestInfo:* [snip of every topic and partition on the broker listed
> > here]*
> > java.net.ConnectException: Connection refused
> >         at sun.nio.ch.Net.connect0(Native Method)
> >         at sun.nio.ch.Net.connect(Net.java:454)
> >         at sun.nio.ch.Net.connect(Net.java:446)
> >         at
> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
> >         at
> kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
> >         at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:44)
> >         at
> kafka.consumer.SimpleConsumer.reconnect(SimpleConsumer.scala:57)
> >         at
> > kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:79)
> >         at
> >
> >
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
> >         at
> >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:109)
> >         at
> >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
> >         at
> >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
> >         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> >         at
> >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:108)
> >         at
> >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
> >         at
> >
> >
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
> >         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> >         at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:107)
> >         at
> >
> >
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
> >         at
> > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
> >         at
> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
> >
> > The logs then spam that ERROR and Exception 5406 times between:
> > 2016-04-13 11:01:28,352 and 2016-04-13 11:01:31,994
> >
> > Then I get this message twice:
> > [2016-04-13 11:01:31,997] INFO [ReplicaFetcherManager on broker 1]
> Removed
> > fetcher for partitions [snip list of all my topics and partitions listed]
> >
> > Then this:
> > [2016-04-13 11:01:32,061] INFO [ReplicaFetcherThread-1-0], Shutting down
> > (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,061] INFO [ReplicaFetcherThread-1-0], Shutting down
> > (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,113] INFO New leader is 1
> > (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
> > [2016-04-13 11:01:32,113] INFO New leader is 1
> > (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
> > [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Shutdown
> > completed (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Shutdown
> > completed (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Stopped
> >  (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Stopped
> >  (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-0-0], Shutting down
> > (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-0-0], Shutting down
> > (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Stopped
> >  (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Stopped
> >  (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Shutdown
> > completed (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Shutdown
> > completed (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-3-0], Shutting down
> > (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-3-0], Shutting down
> > (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Stopped
> >  (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Stopped
> >  (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Shutdown
> > completed (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Shutdown
> > completed (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-2-0], Shutting down
> > (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-2-0], Shutting down
> > (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Stopped
> >  (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Stopped
> >  (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Shutdown
> > completed (kafka.server.ReplicaFetcherThread)
> > [2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Shutdown
> > completed (kafka.server.ReplicaFetcherThread)
> >
> >
> > At this point, there are no more errors to the log file, but all the
> > consumers are still trying to consume from this broker, and are getting
> > Connection Refused exceptions.  It isn't until I cycled the broker that
> > things got back to normal.
> >
> > Can anyone tell me what happened?  Or why consumers didn't recognize that
> > there was a problem with this broker and start consuming from the other
> > one?
> >
> > Can I provide any more details? :)
> >
> > Thank you so much for your time!
> >
>
>
>
> --
> Radha Krishna, Proddaturi
> 253-234-5657
>

Re: Help understanding a failure please.

Posted by R Krishna <kr...@gmail.com>.
Sorry, if this sounds lame, but can you ping or telnet?

On Wed, Apr 13, 2016 at 9:55 AM, Chris Neal <cw...@gmail.com> wrote:

> Hi all.
>
> I'm running a two node cluster that has been rock solid for almost a year
> and a half.  We experienced an outage of one of the two brokers this
> morning, and from the logs, I'm not sure what happened, and how to prevent
> it.
>
> The Kafka version is 0.8.1.1 with Scala 2.10.  Java version is Open JDK
> version 1.8.0_65
>
> Everything running fine, then:
>
> [2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
> [2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
> [2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
> [2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
> [2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
> [2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
> [2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
> [2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
>
> [2016-04-13 11:01:28,352] ERROR [ReplicaFetcherThread-1-0], Error in fetch
> Name: FetchRequest; Version: 0; CorrelationId: 9644043; ClientId:
> ReplicaFetcherThread-1-0; ReplicaId: 1; MaxWait: 500 ms; MinBytes: 1 bytes;
> RequestInfo:* [snip of every topic and partition on the broker listed
> here]*
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.Net.connect0(Native Method)
>         at sun.nio.ch.Net.connect(Net.java:454)
>         at sun.nio.ch.Net.connect(Net.java:446)
>         at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
>         at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
>         at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:44)
>         at kafka.consumer.SimpleConsumer.reconnect(SimpleConsumer.scala:57)
>         at
> kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:79)
>         at
>
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:109)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
>         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:108)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
>         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>         at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:107)
>         at
>
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
>         at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
>
> The logs then spam that ERROR and Exception 5406 times between:
> 2016-04-13 11:01:28,352 and 2016-04-13 11:01:31,994
>
> Then I get this message twice:
> [2016-04-13 11:01:31,997] INFO [ReplicaFetcherManager on broker 1] Removed
> fetcher for partitions [snip list of all my topics and partitions listed]
>
> Then this:
> [2016-04-13 11:01:32,061] INFO [ReplicaFetcherThread-1-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,061] INFO [ReplicaFetcherThread-1-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,113] INFO New leader is 1
> (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
> [2016-04-13 11:01:32,113] INFO New leader is 1
> (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
> [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-0-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-0-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-3-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-3-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-2-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-2-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
>
>
> At this point, there are no more errors to the log file, but all the
> consumers are still trying to consume from this broker, and are getting
> Connection Refused exceptions.  It isn't until I cycled the broker that
> things got back to normal.
>
> Can anyone tell me what happened?  Or why consumers didn't recognize that
> there was a problem with this broker and start consuming from the other
> one?
>
> Can I provide any more details? :)
>
> Thank you so much for your time!
>



-- 
Radha Krishna, Proddaturi
253-234-5657