You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Siva A <si...@gmail.com> on 2018/03/01 17:00:55 UTC

Re: Consumer group intermittently can not read any records from a cluster with 3 nodes that has one node down

Hi

Check the __consumer_offsets topics replication. If it's set to one that's
the issue. Increase the replication of the topic.

Thanks
Siva

On Feb 21, 2018 1:35 PM, "Sandor Murakozi" <sm...@gmail.com> wrote:

> hi Behrang,
> I recommend you to check out some docs that explain how partitions and
> replication work (e.g.
> https://sookocheff.com/post/kafka/kafka-in-a-nutshell/)
>
> What I'd highlight is that the partition leader and the controller are two
> different concepts. Each partition has its own leader and It's the leader
> and not the controller that's responsible for dealing with producers and
> consumers.
>
> Cheers,
> Sandor
>
> On Tue, Feb 20, 2018 at 12:50 PM, Behrang <be...@gmail.com> wrote:
>
> > Hi Sandor,
> >
> > Thanks for your reply. I am not at work right now, but I still am a bit
> > confused about what happened at work:
> >
> > 1- One thing that I confirmed was that one the 3 nodes was definitely
> down.
> > We were unable to telnet into its Kafka port from anywhere. The other two
> > nodes were up and we could telnet into their Kafka port.
> >
> > 2- I modified my app a bit and implemented a means for sending
> > DescribeCluster requests to the cluster, setting bootrstrap-servers to
> all
> > the 3 nodes. The result indicated that the controller node (
> > https://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/admin/
> > DescribeClusterResult.html#controller())
> > had an id that was not amongst the nodes (
> > https://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/admin/
> > DescribeClusterResult.html#nodes()).
> > It was the same node that was down (i.e. I could telnet into the other
> > nodes but not the controller node). And this was always the same, even
> > after a few minutes, the controller node's id was still the same.
> >
> > 3- Despite that, when running my app from my machine, I could get records
> > from the topics I had subscribed to, but from another machine, no records
> > were getting sent to the app. The app running on the other machine had a
> > different consumer groups though.
> >
> > 4- The cluster had three nodes and when the controller node was done,
> most
> > of the time I was getting a message like this: *"Connection to node -N
> > could not be established. Broker may not be available."* where N was
> either
> > -1, -2, or -3 but at one point in my app's logs I found a handful of
> > entries in which N was a very large number (e.g. 2156987456).
> >
> > I assume our cluster was misbehaving, but still can't explain why my app
> > was working like this.
> >
> >
> > Best regards,
> > Behrang Saeedzadeh
> >
> > On 20 February 2018 at 19:22, Sandor Murakozi <sm...@gmail.com>
> wrote:
> >
> > > Hi Behrang,
> > >
> > > All reads and writes of a partition go through the leader of that
> > > partition.
> > > If the leader of a partition is down you will not be able to
> > > produce/consume data in it until a new leader is elected. Typically it
> > > happens in a few seconds, after that you should be able to use that
> > > partition again. If your problem persists I recommend figuring out why
> > > leader election does not happen.
> > > You might be able to work with other partitions, at least those that
> have
> > > leaders on brokers that are up.
> > >
> > > Cheers,
> > > Sandor Murakozi
> > >
> > > On Tue, Feb 20, 2018 at 9:00 AM, Behrang <be...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I have a Kafka cluster with 3 nodes.
> > > >
> > > > I pass the nodes in the cluster to a consumer app I am building as
> > > > bootstrap servers.
> > > >
> > > > When one of the nodes in the cluster is down, the consumer group
> > > sometimes
> > > > CAN read records from the server but sometimes CAN NOT.
> > > >
> > > > In both cases, the same Kafka node is down.
> > > >
> > > > Is this behavior normal? Isn't it enough to only have one of the
> nodes
> > in
> > > > the Kafka cluster be up and running? I have not delved much into
> setup
> > > and
> > > > administration of Kafka clusters, but I thought Kafka uses the nodes
> > for
> > > HA
> > > > and as long as one node is up and running, the cluster remains
> healthy
> > > and
> > > > working.
> > > >
> > > > Best regards,
> > > > Behrang Saeedzadeh
> > > >
> > >
> >
>