You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Flavio Junqueira (JIRA)" <ji...@apache.org> on 2015/09/22 23:10:07 UTC

[jira] [Commented] (KAFKA-1387) Kafka getting stuck creating ephemeral node it has already created when two zookeeper sessions are established in a very short period of time

    [ https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903437#comment-14903437 ] 

Flavio Junqueira commented on KAFKA-1387:
-----------------------------------------

hey [~guozhang]

bq. Will the mixing usage of ZK directly and ZkClient together violate ordering? AFAIK ZkClient orders all events fired by watchers and hand them to the user callbacks one-by-one, if we use ZK's Watcher directly will its callback be called out-of-order with other events?

ZkClient indeed handles the processing to a separate thread. To avoid blocking the dispatcher thread, it uses a separate thread to deliver events. This can be a problem if the events here and events handled directly by ZkClient are correlated. I tried to confine the ZK processing for this feature in the same class to avoid ordering issues. I don't see a problem concretely, but if you do, let me know. Right now it sounds like you're just speculating that it could be a problem, yes?

bq. If we get a Code.OK in CreateCallback, do we still need to trigger a ZooKeeper.exist with ExistsCallback again?

Right, that exists call is to set a watch.

bq. For the consumer / server registration case particularly, we tries to handle parent path creation in ZkUtils.makeSurePersistentPathExists, so I feel we should expose the problem that parent path does not exist yet instead trying to hide it in createRecursive.

I've commented on the PR about this. What's your specific concern here? If you could elaborate a bit more, I'd appreciate.  

> Kafka getting stuck creating ephemeral node it has already created when two zookeeper sessions are established in a very short period of time
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1387
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1387
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.1.1
>            Reporter: Fedor Korotkiy
>            Assignee: Flavio Junqueira
>            Priority: Critical
>              Labels: newbie, patch, zkclient-problems
>             Fix For: 0.9.0.0
>
>         Attachments: KAFKA-1387.patch, kafka-1387.patch
>
>
> Kafka broker re-registers itself in zookeeper every time handleNewSession() callback is invoked.
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala 
> Now imagine the following sequence of events.
> 1) Zookeeper session reestablishes. handleNewSession() callback is queued by the zkClient, but not invoked yet.
> 2) Zookeeper session reestablishes again, queueing callback second time.
> 3) First callback is invoked, creating /broker/[id] ephemeral path.
> 4) Second callback is invoked and it tries to create /broker/[id] path using createEphemeralPathExpectConflictHandleZKBug() function. But the path is already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting stuck in the infinite loop.
> Seems like controller election code have the same issue.
> I'am able to reproduce this issue on the 0.8.1 branch from github using the following configs.
> # zookeeper
> tickTime=10
> dataDir=/tmp/zk/
> clientPort=2101
> maxClientCnxns=0
> # kafka
> broker.id=1
> log.dir=/tmp/kafka
> zookeeper.connect=localhost:2101
> zookeeper.connection.timeout.ms=100
> zookeeper.sessiontimeout.ms=100
> Just start kafka and zookeeper and then pause zookeeper several times using Ctrl-Z.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)