You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Ismael Juma (Jira)" <ji...@apache.org> on 2020/05/26 13:55:00 UTC

[jira] [Comment Edited] (KAFKA-10041) Kafka upgrade fails from 1.1 to 2.4/2.5/trunk fails due to failure in ZooKeeper

    [ https://issues.apache.org/jira/browse/KAFKA-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116761#comment-17116761 ] 

Ismael Juma edited comment on KAFKA-10041 at 5/26/20, 1:54 PM:
---------------------------------------------------------------

The upgrade notes for Kafka mention this issue and the workaround:
{quote}ZooKeeper has been upgraded to 3.5.7, and a ZooKeeper upgrade from 3.4.X to 3.5.7 can fail if there are no snapshot files in the 3.4 data directory. This usually happens in test upgrades where ZooKeeper 3.5.7 is trying to load an existing 3.4 data dir in which no snapshot file has been created. For more details about the issue please refer to ZOOKEEPER-3056. A fix is given in ZOOKEEPER-3056, which is to set snapshot.trust.empty=true config in zookeeper.properties before the upgrade.
{quote}
 


was (Author: ijuma):
The upgrade notes for Kafka mention this issue and the workaround:
{noformat}
ZooKeeper has been upgraded to 3.5.7, and a ZooKeeper upgrade from 3.4.X to 3.5.7 can fail if there are no snapshot files in the 3.4 data directory. This usually happens in test upgrades where ZooKeeper 3.5.7 is trying to load an existing 3.4 data dir in which no snapshot file has been created. For more details about the issue please refer to ZOOKEEPER-3056. A fix is given in ZOOKEEPER-3056, which is to set snapshot.trust.empty=true config in zookeeper.properties before the upgrade.
{noformat}
 

> Kafka upgrade fails from 1.1 to 2.4/2.5/trunk fails due to failure in ZooKeeper
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-10041
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10041
>             Project: Kafka
>          Issue Type: Bug
>          Components: zkclient
>    Affects Versions: 2.4.0, 2.5.0, 2.6.0
>            Reporter: Zhuqi Jin
>            Priority: Major
>
> When we tested upgrading Kafka from 1.1 to 2.4/2.5, the upgraded node failed to start due to a known zookeeper failure - ZOOKEEPER-3056.
> The error message is shown below:
>  
> {code:java}
> [2020-05-24 23:45:17,638] ERROR Unexpected exception, exiting abnormally (org.apache.zookeeper.server.ZooKeeperServerMain)
> java.io.IOException: No snapshot found, but there are log entries. Something is broken!
>  at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)
>  at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>  at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
>  at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
>  at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
>  at org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
>  at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
>  at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
>  at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
>  at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
>  at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> {code}
>  
> {code:java}
> [2020-05-24 23:45:25,142] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
>  at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:259)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
>  at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:255)
>  at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:113)
>  at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1858)
>  at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:375)
>  at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:399)
>  at kafka.server.KafkaServer.startup(KafkaServer.scala:207)
>  at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
>  at kafka.Kafka$.main(Kafka.scala:84)
>  at kafka.Kafka.main(Kafka.scala){code}
> It can be reproduced through the following steps:
> 1. Start a single-node kafka 1.1. 
> 2. Create a topic and use kafka-producer-perf-test.sh to produce several message.
> {code:java}
> bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test 
> bin/kafka-producer-perf-test.sh --topic test --num-records 500 --record-size 300 --throughput 100 --producer-props bootstrap.servers=localhost:9092{code}
> 3. Upgrade the node to 2.4/2.5 with the same configuration. The new version node failed to start because of the zookeeper.
> Kafka 1.1 is using dependant-libs-2.11.12/zookeeper-3.4.10.jar, and Kafka 2.4/2.5/trunk(5302efb2d1b7a69bcd3173a13b2d08a2666979ed) are using zookeeper-3.5.8.jar
> The bug is fixed in zookeeper-3.6.0, should we upgrade the dependency of Kafka 2.4/2.5/trunk to use zookeeper-3.6.0.jar?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)