You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Ron Dagostino (Jira)" <ji...@apache.org> on 2023/04/12 14:15:00 UTC

[jira] [Resolved] (KAFKA-14890) Kafka initiates shutdown due to connectivity problem with Zookeeper and FatalExitError from ChangeNotificationProcessorThread

     [ https://issues.apache.org/jira/browse/KAFKA-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ron Dagostino resolved KAFKA-14890.
-----------------------------------
    Resolution: Duplicate

Duplicate of https://issues.apache.org/jira/browse/KAFKA-14887

> Kafka initiates shutdown due to connectivity problem with Zookeeper and FatalExitError from ChangeNotificationProcessorThread
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-14890
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14890
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 3.3.2
>            Reporter: Denis Razuvaev
>            Priority: Major
>
> Hello, 
> We have faced several times the deadlock in Kafka, the similar issue is - https://issues.apache.org/jira/browse/KAFKA-13544 
> The question - is it expected behavior that Kafka decided to shut down due to connectivity problems with Zookeeper? Seems like it is related to the inability to read data from */feature* Zk node and the _ZooKeeperClientExpiredException_ thrown from _ZooKeeperClient_ class. This exception is thrown and it is caught only in catch block of _doWork()_ method in {_}ChangeNotificationProcessorThread{_}, and it leads to {_}FatalExitError{_}. 
> This problem with shutdown is reproduced in the new versions of Kafka (which already have fix regarding deadlock from 13544). 
> It is hard to write a synthetic test to reproduce problem, but it can be reproduced locally via debug mode with the following steps: 
> 1) Start Zookeeper and start Kafka in debug mode. 
> 2) Emulate connectivity problem between Kafka and Zookeeper, for example connection can be closed via Netcrusher library. 
> 3) Put a breakpoint in _updateLatestOrThrow()_ method in _FeatureCacheUpdater_ class, before _zkClient.getDataAndVersion(featureZkNodePath)_ line execution. 
> 4) Restore connection between Kafka and Zookeeper after session expiration. Kafka execution should be stopped on the breakpoint.
> 5) Resume execution until Kafka starts to execute line _zooKeeperClient.handleRequests(remainingRequests)_ in _retryRequestsUntilConnected_ method in _KafkaZkClient_ class. 
> 6) Again emulate connectivity problem between Kafka and Zookeeper and wait until session will be expired. 
> 7) Restore connection between Kafka and Zookeeper. 
> 8) Kafka begins shutdown process, due to: 
> _ERROR [feature-zk-node-event-process-thread]: Failed to process feature ZK node change event. The broker will eventually exit. (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)_ 
> The following problems on the real environment can be caused by some network problems and periodic disconnection and connection to the Zookeeper in a short time period. 
> I started mail thread in [https://lists.apache.org/thread/gbk4scwd8g7mg2tfsokzj5tjgrjrb9dw] regarding this problem, but have no answers.
> For me it seems like defect, because Kafka initiates shutdown after restoring connection between Kafka and Zookeeper, and should be fixed. 
> Thank you.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)