You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Denis Razuvaev <de...@netcracker.com.INVALID> on 2023/04/06 14:18:32 UTC

Kafka initiates shutdown due to connectivity problem with Zookeeper and FatalExitError from ChangeNotificationProcessorThread

Hello,

We have faced several times the deadlock in Kafka, the similar issue is - https://issues.apache.org/jira/browse/KAFKA-13544

The question - is it expected behavior that Kafka decided to shut down due to connectivity problems with Zookeeper?
Seems like it is related to the inability to read data from /feature Zk node and the ZooKeeperClientExpiredException thrown from ZooKeeperClient class. This exception is thrown and it is caught only in catch block of doWork() method in ChangeNotificationProcessorThread, and it leads to FatalExitError.

This problem is reproduced in the new versions of Kafka (which already have fix regarding deadlock).

It is hard to write a synthetic test to reproduce problem, but it can be reproduced locally via debug mode with the following steps:
1) Start Zookeeper and start Kafka in debug mode.
2) Emulate connectivity problem between Kafka and Zookeeper, for example connection can be closed via Netcrusher library.
3) Put a breakpoint in updateLatestOrThrow() method in FeatureCacheUpdater class, before zkClient.getDataAndVersion(featureZkNodePath) line execution.
4) Restore connection between Kafka and Zookeeper after session expiration. Kafka execution should be stopped on the breakpoint
5) Resume execution until Kafka starts to execute line zooKeeperClient.handleRequests(remainingRequests) in retryRequestsUntilConnected method in KafkaZkClient class.
6) Again emulate connectivity problem between Kafka and Zookeeper and wait until session will be expired.
7) Restore connection between Kafka and Zookeeper.
8) Kafka begins shutdown process, due to:
ERROR [feature-zk-node-event-process-thread]: Failed to process feature ZK node change event. The broker will eventually exit. (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)

The following problems on the real environment can be caused by some network problems and periodic disconnection and connection to the Zookeeper in a short time period.

So, the question - is it by design that Kafka begins shutdown process in such scenarios or it is a defect?

Regards,



________________________________
The information transmitted herein is intended only for the person or entity to which it is addressed and may contain confidential, proprietary and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.