You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Shailesh Hemdev <sh...@foresee.com> on 2016/12/14 02:41:44 UTC

Kafka Errors and Hung Brokers

We are using a 3 node Kafka cluster and are encountering some weird issues.

1) On Each node, when we tail the server.log file under /var/log/kafka we
see continuous errors like these

pic-partition. (kafka.server.ReplicaFetcherThread)
[2016-12-14 02:39:30,747] ERROR [ReplicaFetcherThread-0-4410000], Error for
partition [dev-core-platform-logging,15] to broker
4410000:org.apache.kafka.common.errors.NotLeaderForPartitionException: This
server is not the leader for that topic-partition.
(kafka.server.ReplicaFetcherThread)

The broker is up and is showing under zookeeper. So it is not clear why
these errors occur

2) Occasionally we will find a Kafka broker that goes down. We have
adjusted the Ulimit to increase open files as well as added 6g to the heap.
When the broker goes down, the process is itself up but is de registered
from Zookeeper

Thanks,

*Shailesh *

-- 


This email communication (including any attachments) contains information 
from Answers Corporation or its affiliates that is confidential and may be 
privileged. The information contained herein is intended only for the use 
of the addressee(s) named above. If you are not the intended recipient (or 
the agent responsible to deliver it to the intended recipient), you are 
hereby notified that any dissemination, distribution, use, or copying of 
this communication is strictly prohibited. If you have received this email 
in error, please immediately reply to sender, delete the message and 
destroy all copies of it. If you have questions, please email 
legal@answers.com. 

If you wish to unsubscribe to commercial emails from Answers and its 
affiliates, please go to the Answers Subscription Center 
http://campaigns.answers.com/subscriptions to opt out.  Thank you.

Re: Kafka Errors and Hung Brokers

Posted by Shailesh Hemdev <sh...@foresee.com>.
Hi Apurva,

The first error vanished after I restarted all the brokers. I haven't seen
these recursive errors and my thought is since we restarted zookeeper nodes
we might have put all the brokers in some sort of a iffy state

The broker occasionally being hung has plagued us quite a bit. Our Kafka
nodes and Zookeeper nodes are all on EC2 instances. Kafka nodes communicate
with Zookeeper nodes within the same VPC but they go a load balancer E.g.
Kafka node --> Internal Load Balancer specific to a ZK Node --> ZK Node.
This allows us to bring down ZK nodes and spin up new ones w/o having to
change Kafka configuration. I am not sure this could cause an issue. I
haven't seen any specific ZK errors on  /var/log/kafka/server.log

Thanks,

Shailesh

On Wed, Dec 14, 2016 at 2:49 PM, Apurva Mehta <ap...@confluent.io> wrote:

> Regarding 1), you can see a NotLeaderForPartition exception if the leader
> for the partition has moved to another host but the client metadata has not
> updated itself yet. The messages should disappear once the metadata is
> updated on all clients.
>
> Leaders may move if brokers are bounced, or if they have connectivity
> issues with zookeeper. Looking at your second point, it seems like
> connectivity may be a problem. Where is zookeeper running? do your brokers
> have a solid link to that machine? Do you see any zookeeper connection
> errors in your broker logs?
>
> On Tue, Dec 13, 2016 at 6:41 PM, Shailesh Hemdev <
> shailesh.hemdev@foresee.com> wrote:
>
> > We are using a 3 node Kafka cluster and are encountering some weird
> issues.
> >
> > 1) On Each node, when we tail the server.log file under /var/log/kafka we
> > see continuous errors like these
> >
> > pic-partition. (kafka.server.ReplicaFetcherThread)
> > [2016-12-14 02:39:30,747] ERROR [ReplicaFetcherThread-0-4410000], Error
> > for
> > partition [dev-core-platform-logging,15] to broker
> > 4410000:org.apache.kafka.common.errors.NotLeaderForPartitionException:
> > This
> > server is not the leader for that topic-partition.
> > (kafka.server.ReplicaFetcherThread)
> >
> > The broker is up and is showing under zookeeper. So it is not clear why
> > these errors occur
> >
> > 2) Occasionally we will find a Kafka broker that goes down. We have
> > adjusted the Ulimit to increase open files as well as added 6g to the
> heap.
> > When the broker goes down, the process is itself up but is de registered
> > from Zookeeper
> >
> > Thanks,
> >
> > *Shailesh *
> >
> > --
> >
> >
> > This email communication (including any attachments) contains information
> > from Answers Corporation or its affiliates that is confidential and may
> be
> > privileged. The information contained herein is intended only for the use
> > of the addressee(s) named above. If you are not the intended recipient
> (or
> > the agent responsible to deliver it to the intended recipient), you are
> > hereby notified that any dissemination, distribution, use, or copying of
> > this communication is strictly prohibited. If you have received this
> email
> > in error, please immediately reply to sender, delete the message and
> > destroy all copies of it. If you have questions, please email
> > legal@answers.com.
> >
> > If you wish to unsubscribe to commercial emails from Answers and its
> > affiliates, please go to the Answers Subscription Center
> > http://campaigns.answers.com/subscriptions to opt out.  Thank you.
> >
>



-- 

*Shailesh Hemdev*
Manager, Software Engineering
shailesh.hemdev@foresee.com
p (734) 352-6247
<https://t.xink.io/Tracking/Index/6qgAAGJcAACj1gkA0>

-- 


This email communication (including any attachments) contains information 
from Answers Corporation or its affiliates that is confidential and may be 
privileged. The information contained herein is intended only for the use 
of the addressee(s) named above. If you are not the intended recipient (or 
the agent responsible to deliver it to the intended recipient), you are 
hereby notified that any dissemination, distribution, use, or copying of 
this communication is strictly prohibited. If you have received this email 
in error, please immediately reply to sender, delete the message and 
destroy all copies of it. If you have questions, please email 
legal@answers.com. 

If you wish to unsubscribe to commercial emails from Answers and its 
affiliates, please go to the Answers Subscription Center 
http://campaigns.answers.com/subscriptions to opt out.  Thank you.

Re: Kafka Errors and Hung Brokers

Posted by Apurva Mehta <ap...@confluent.io>.
Regarding 1), you can see a NotLeaderForPartition exception if the leader
for the partition has moved to another host but the client metadata has not
updated itself yet. The messages should disappear once the metadata is
updated on all clients.

Leaders may move if brokers are bounced, or if they have connectivity
issues with zookeeper. Looking at your second point, it seems like
connectivity may be a problem. Where is zookeeper running? do your brokers
have a solid link to that machine? Do you see any zookeeper connection
errors in your broker logs?

On Tue, Dec 13, 2016 at 6:41 PM, Shailesh Hemdev <
shailesh.hemdev@foresee.com> wrote:

> We are using a 3 node Kafka cluster and are encountering some weird issues.
>
> 1) On Each node, when we tail the server.log file under /var/log/kafka we
> see continuous errors like these
>
> pic-partition. (kafka.server.ReplicaFetcherThread)
> [2016-12-14 02:39:30,747] ERROR [ReplicaFetcherThread-0-4410000], Error
> for
> partition [dev-core-platform-logging,15] to broker
> 4410000:org.apache.kafka.common.errors.NotLeaderForPartitionException:
> This
> server is not the leader for that topic-partition.
> (kafka.server.ReplicaFetcherThread)
>
> The broker is up and is showing under zookeeper. So it is not clear why
> these errors occur
>
> 2) Occasionally we will find a Kafka broker that goes down. We have
> adjusted the Ulimit to increase open files as well as added 6g to the heap.
> When the broker goes down, the process is itself up but is de registered
> from Zookeeper
>
> Thanks,
>
> *Shailesh *
>
> --
>
>
> This email communication (including any attachments) contains information
> from Answers Corporation or its affiliates that is confidential and may be
> privileged. The information contained herein is intended only for the use
> of the addressee(s) named above. If you are not the intended recipient (or
> the agent responsible to deliver it to the intended recipient), you are
> hereby notified that any dissemination, distribution, use, or copying of
> this communication is strictly prohibited. If you have received this email
> in error, please immediately reply to sender, delete the message and
> destroy all copies of it. If you have questions, please email
> legal@answers.com.
>
> If you wish to unsubscribe to commercial emails from Answers and its
> affiliates, please go to the Answers Subscription Center
> http://campaigns.answers.com/subscriptions to opt out.  Thank you.
>