You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "NanerLee (Jira)" <ji...@apache.org> on 2019/12/04 15:59:00 UTC

[jira] [Created] (KAFKA-9267) ZkSecurityMigrator should not create /controller node

NanerLee created KAFKA-9267:
-------------------------------

             Summary: ZkSecurityMigrator should not create /controller node
                 Key: KAFKA-9267
                 URL: https://issues.apache.org/jira/browse/KAFKA-9267
             Project: Kafka
          Issue Type: Bug
          Components: admin
            Reporter: NanerLee


As we can see in these source codes – [ZkSecurityMigrator.scala#L226|[https://github.com/apache/kafka/blob/2accf14ccf9b1f96c9dd8cfb94530c56378fae80/core/src/main/scala/kafka/admin/ZkSecurityMigrator.scala#L226]|https://github.com/apache/kafka/blob/2accf14ccf9b1f96c9dd8cfb94530c56378fae80/core/src/main/scala/kafka/admin/ZkSecurityMigrator.scala#L226]).]

_ZkSecurityMigrator_ checks and sets acl recursively for each path in _SecureRootPaths_. And _/controller_ is also in _SecureRootPaths_.
As we can predicted, _zkClient.makeSurePersistentPathExists()_ will create _/controller_ node if _/controller_ is not existed.

_/controller_ is a *EPHEMERAL* node for controller election, but _makeSurePersistentPathExists()_ will create a *PERSISTENT* node with *null* data.

 

If that happens, null data will cause a *NPE*, and the controller cannot be elected, kafka cluster will be unavailable .
In addition, a *PERSISTENT* node doesn't disappear automatically, we have to delete it manually to fix the problem.


*PERSISTENT* _/controller_ node with *null* data in zk:
{code:java}
[zk: localhost:2181(CONNECTED) 16] get /kafka/controller
null
cZxid = 0x1100002284
ctime = Tue Dec 03 18:37:26 CST 2019
mZxid = 0x1100002284
mtime = Tue Dec 03 18:37:26 CST 2019
pZxid = 0x1100002284
cversion = 0
dataVersion = 0
aclVersion = 1
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0{code}
 

*Normal* /controller node in zk:
{code:java}
[zk: localhost:2181(CONNECTED) 21] get /kafka/controller
{"version":1,"brokerid":1001,"timestamp":"1575370170528"}
cZxid = 0x11000023e1
ctime = Tue Dec 03 18:49:30 CST 2019
mZxid = 0x11000023e1
mtime = Tue Dec 03 18:49:30 CST 2019
pZxid = 0x11000023e1
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x16ecb572df50021
dataLength = 57
numChildren = 0{code}
 

*NPE* in controller.log :

 
{code:java}
[2019-11-21 15:02:41,276] INFO [ControllerEventThread controllerId=1002] Starting (kafka.controller.ControllerEventManager$ControllerEventThread)
[2019-11-21 15:02:41,282] ERROR [ControllerEventThread controllerId=1002] Error processing event Startup (kafka.controller.ControllerEventManager$ControllerEventThread)
java.lang.NullPointerException
 at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:857)
 at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2572)
 at kafka.utils.Json$.parseBytes(Json.scala:62)
 at kafka.zk.ControllerZNode$.decode(ZkData.scala:56)
 at kafka.zk.KafkaZkClient.getControllerId(KafkaZkClient.scala:902)
 at kafka.controller.KafkaController.kafka$controller$KafkaController$$elect(KafkaController.scala:1199)
 at kafka.controller.KafkaController$Startup$.process(KafkaController.scala:1148)
 at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:86)
 at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
 at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:86)
 at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31)
 at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:85)
 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82){code}
 


So, I submit a PR that _ZkSecurityMigrator_ will not handle _/controller_ node when _/controller_ is not existed.

This bug seems to affect all versions, please review and merge the PR as soon as possible.

Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)