You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Manikumar (Jira)" <ji...@apache.org> on 2019/12/06 09:56:00 UTC

[jira] [Resolved] (KAFKA-9267) ZkSecurityMigrator should not create /controller node

     [ https://issues.apache.org/jira/browse/KAFKA-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Manikumar resolved KAFKA-9267.
------------------------------
    Fix Version/s: 2.5.0
       Resolution: Fixed

Issue resolved by pull request 7778
[https://github.com/apache/kafka/pull/7778]

> ZkSecurityMigrator should not create /controller node
> -----------------------------------------------------
>
>                 Key: KAFKA-9267
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9267
>             Project: Kafka
>          Issue Type: Bug
>          Components: admin
>            Reporter: NanerLee
>            Priority: Major
>             Fix For: 2.5.0
>
>
> As we can see in these source codes – [ZkSecurityMigrator.scala#L226|#L226]
> _ZkSecurityMigrator_ checks and sets acl recursively for each path in _SecureRootPaths_. And _/controller_ is also in _SecureRootPaths_.
> As we can predicted, _zkClient.makeSurePersistentPathExists()_ will create _/controller_ node if _/controller_ is not existed.
> _/controller_ is a *EPHEMERAL* node for controller election, but _makeSurePersistentPathExists()_ will create a *PERSISTENT* node with *null* data.
> If that happens, null data will cause a *NPE*, and the controller cannot be elected, kafka cluster will be unavailable .
>  In addition, a *PERSISTENT* node doesn't disappear automatically, we have to delete it manually to fix the problem.
>  
> *PERSISTENT* _/controller_ node with *null* data in zk:
> {code:java}
> [zk: localhost:2181(CONNECTED) 16] get /kafka/controller
> null
> cZxid = 0x1100002284
> ctime = Tue Dec 03 18:37:26 CST 2019
> mZxid = 0x1100002284
> mtime = Tue Dec 03 18:37:26 CST 2019
> pZxid = 0x1100002284
> cversion = 0
> dataVersion = 0
> aclVersion = 1
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 0{code}
> *Normal* /controller node in zk:
> {code:java}
> [zk: localhost:2181(CONNECTED) 21] get /kafka/controller
> {"version":1,"brokerid":1001,"timestamp":"1575370170528"}
> cZxid = 0x11000023e1
> ctime = Tue Dec 03 18:49:30 CST 2019
> mZxid = 0x11000023e1
> mtime = Tue Dec 03 18:49:30 CST 2019
> pZxid = 0x11000023e1
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x16ecb572df50021
> dataLength = 57
> numChildren = 0{code}
>  *NPE* in controller.log : 
> {code:java}
> [2019-11-21 15:02:41,276] INFO [ControllerEventThread controllerId=1002] Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
> [2019-11-21 15:02:41,282] ERROR [ControllerEventThread controllerId=1002] Error processing event Startup (kafka.controller.ControllerEventManager$ControllerEventThread)
> java.lang.NullPointerException
>  at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857)
>  at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2572)
>  at kafka.utils.Json$.parseBytes(Json.scala:62)
>  at kafka.zk.ControllerZNode$.decode(ZkData.scala:56)
>  at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:902)
>  at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1199)
>  at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1148)
>  at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
>  at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
>  at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
>  at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
>  at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
>  at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82){code}
>  
> So, I submit a PR that _ZkSecurityMigrator_ will not handle _/controller_ node when _/controller_ is not existed.
> This bug seems to affect all versions, please review and merge the PR as soon as possible.
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)