You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Guozhang Wang (JIRA)" <ji...@apache.org> on 2013/07/31 04:43:48 UTC

[jira] [Commented] (KAFKA-992) Double Check on Broker Registration to Avoid False NodeExist Exception

    [ https://issues.apache.org/jira/browse/KAFKA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724789#comment-13724789 ] 

Guozhang Wang commented on KAFKA-992:
-------------------------------------

We can differentiate this edge case from a temporal connection loss by adding a timestamp into the broker ZK string so that the conflict will be reflected. Then we can check if the host:port are the same. If this is the case, then we can treat this ephemeral node as written by the broker itself but from a previous session, hence backoff for it to be deleted on session timeout and retry creating the ephemeral node. This will make the temporal connection loss a false positive case, but it should be fine since this case happens rarely.

                
> Double Check on Broker Registration to Avoid False NodeExist Exception
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-992
>                 URL: https://issues.apache.org/jira/browse/KAFKA-992
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>
> There is a potential bug in Zookeeper that when the ZK leader processes a lot of session expiration events (this could be due to a long GC or a fsync operation, etc), it marks the session as expired but does not delete the corresponding ephemeral znode at the same time. 
> Meanwhile, a new session event will be fired on the kafka server and the server will request the same ephemeral node to be created on handling the new session. When it enters the zookeeper processing queue, this operation receives a NodeExists error since zookeeper leader has not finished deleting that ephemeral znode and still thinks the previous session holds it. Kafka assumes that the NodeExists error on ephemeral node creation is ok since that is a legitimate condition that happens during session disconnects on zookeeper. However, a NodeExists error is only valid if the owner session id also matches Kafka server's current zookeeper session id. The bug is that before sending a NodeExists error, Zookeeper should check if the ephemeral node in question is held by a session that has marked as expired.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira