You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Manikumar (Jira)" <ji...@apache.org> on 2019/12/06 09:56:00 UTC
[jira] [Resolved] (KAFKA-9267) ZkSecurityMigrator should not create
/controller node
[ https://issues.apache.org/jira/browse/KAFKA-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manikumar resolved KAFKA-9267.
------------------------------
Fix Version/s: 2.5.0
Resolution: Fixed
Issue resolved by pull request 7778
[https://github.com/apache/kafka/pull/7778]
> ZkSecurityMigrator should not create /controller node
> -----------------------------------------------------
>
> Key: KAFKA-9267
> URL: https://issues.apache.org/jira/browse/KAFKA-9267
> Project: Kafka
> Issue Type: Bug
> Components: admin
> Reporter: NanerLee
> Priority: Major
> Fix For: 2.5.0
>
>
> As we can see in these source codes – [ZkSecurityMigrator.scala#L226|#L226]
> _ZkSecurityMigrator_ checks and sets acl recursively for each path in _SecureRootPaths_. And _/controller_ is also in _SecureRootPaths_.
> As we can predicted, _zkClient.makeSurePersistentPathExists()_ will create _/controller_ node if _/controller_ is not existed.
> _/controller_ is a *EPHEMERAL* node for controller election, but _makeSurePersistentPathExists()_ will create a *PERSISTENT* node with *null* data.
> If that happens, null data will cause a *NPE*, and the controller cannot be elected, kafka cluster will be unavailable .
> In addition, a *PERSISTENT* node doesn't disappear automatically, we have to delete it manually to fix the problem.
>
> *PERSISTENT* _/controller_ node with *null* data in zk:
> {code:java}
> [zk: localhost:2181(CONNECTED) 16] get /kafka/controller
> null
> cZxid = 0x1100002284
> ctime = Tue Dec 03 18:37:26 CST 2019
> mZxid = 0x1100002284
> mtime = Tue Dec 03 18:37:26 CST 2019
> pZxid = 0x1100002284
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 0{code}
> *Normal* /controller node in zk:
> {code:java}
> [zk: localhost:2181(CONNECTED) 21] get /kafka/controller
> {"version":1,"brokerid":1001,"timestamp":"1575370170528"}
> cZxid = 0x11000023e1
> ctime = Tue Dec 03 18:49:30 CST 2019
> mZxid = 0x11000023e1
> mtime = Tue Dec 03 18:49:30 CST 2019
> pZxid = 0x11000023e1
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x16ecb572df50021
> dataLength = 57
> numChildren = 0{code}
> *NPE* in controller.log :
> {code:java}
> [2019-11-21 15:02:41,276] INFO [ControllerEventThread controllerId=1002] Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
> [2019-11-21 15:02:41,282] ERROR [ControllerEventThread controllerId=1002] Error processing event Startup (kafka.controller.ControllerEventManager$ControllerEventThread)
> java.lang.NullPointerException
> at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857)
> at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2572)
> at kafka.utils.Json$.parseBytes(Json.scala:62)
> at kafka.zk.ControllerZNode$.decode(ZkData.scala:56)
> at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:902)
> at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1199)
> at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1148)
> at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
> at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
> at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
> at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82){code}
>
> So, I submit a PR that _ZkSecurityMigrator_ will not handle _/controller_ node when _/controller_ is not existed.
> This bug seems to affect all versions, please review and merge the PR as soon as possible.
> Thanks!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)