You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by tao xiao <xi...@gmail.com> on 2015/07/08 14:07:49 UTC

Got conflicted ephemeral node exception for several hours

Hi team,

I have 10 high level consumers connecting to Kafka and one of them kept
complaining "conflicted ephemeral node" for about 8 hours. The log was
filled with below exception

[2015-07-07 14:03:51,615] INFO conflict in
/consumers/group/ids/test-1435856975563-9a9fdc6c data:
{"version":1,"subscription":{"test.*":1},"pattern":"white_list","timestamp":"1436275631510"}
stored data:
{"version":1,"subscription":{"test.*":1},"pattern":"white_list","timestamp":"1436275558570"}
(kafka.utils.ZkUtils$)
[2015-07-07 14:03:51,616] INFO I wrote this conflicted ephemeral node
[{"version":1,"subscription":{"test.*":1},"pattern":"white_list","timestamp":"1436275631510"}]
at /consumers/group/ids/test-1435856975563-9a9fdc6c a while back in a
different session, hence I will backoff for this node to be deleted by
Zookeeper and retry (kafka.utils.ZkUtils$)

In the meantime zookeeper reported below exception for the same time span

2015-07-07 22:45:09,687 [myid:3] - INFO  [ProcessThread(sid:3
cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when
processing sessionid:0x44e657ff19c0019 type:create cxid:0x7a26
zxid:0x3015f6e77 txntype:-1 reqpath:n/a Error
Path:/consumers/group/ids/test-1435856975563-9a9fdc6c Error:KeeperErrorCode
= NodeExists for /consumers/group/ids/test-1435856975563-9a9fdc6c

At the end zookeeper timed out the session and consumers triggered
rebalance.

I know that conflicted ephemeral node warning is to handle a zookeeper bug
that session expiration and ephemeral node deletion are not done atomically
but as indicated from zookeeper log the zookeeper never got a chance to
delete the ephemeral node which made me think that the session was not
expired at that time. And for some reason zookeeper fired session expire
event which subsequently invoked ZKSessionExpireListener.  I was just
wondering if anyone have ever encountered similar issue before and what I
can do at zookeeper side to prevent this?

Another problem is that createEphemeralPathExpectConflictHandleZKBug call
is wrapped in a while(true) loop which runs forever until the ephemeral
node is created. Would it be better that we can employ an exponential retry
policy with a max number of retries so that it has a chance to re-throw the
exception back to caller and let caller handle it in situation like above?

Re: Got conflicted ephemeral node exception for several hours

Posted by Mayuresh Gharat <gh...@gmail.com>.

Bouncing the consumers should solve this issue in most cases.

Thanks,

Mayuresh

On Sun, Jul 12, 2015 at 8:21 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> Hi Tao,
>
> We see this error from time to time but did not think of this as a big
> issue. Any reason it bothers you much?
> I¹m not sure if throwing exception to user on this exception is a good
> handling or not. What are user supposed to do in that case other than
> retry?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On 7/12/15, 7:16 PM, "tao xiao" <xi...@gmail.com> wrote:
>
> >We saw the error again in our cluster.  Anyone has the same issue before?
> >
> >On Fri, 10 Jul 2015 at 13:26 tao xiao <xi...@gmail.com> wrote:
> >
> >> Bump the thread. Any help would be appreciated.
> >>
> >> On Wed, 8 Jul 2015 at 20:09 tao xiao <xi...@gmail.com> wrote:
> >>
> >>> Additional info
> >>> Kafka version: 0.8.2.1
> >>> zookeeper: 3.4.6
> >>>
> >>> On Wed, 8 Jul 2015 at 20:07 tao xiao <xi...@gmail.com> wrote:
> >>>
> >>>> Hi team,
> >>>>
> >>>> I have 10 high level consumers connecting to Kafka and one of them
> >>>>kept
> >>>> complaining "conflicted ephemeral node" for about 8 hours. The log was
> >>>> filled with below exception
> >>>>
> >>>> [2015-07-07 14:03:51,615] INFO conflict in
> >>>> /consumers/group/ids/test-1435856975563-9a9fdc6c data:
> >>>>
> >>>>{"version":1,"subscription":{"test.*":1},"pattern":"white_list","timest
> >>>>amp":"1436275631510"}
> >>>> stored data:
> >>>>
> >>>>{"version":1,"subscription":{"test.*":1},"pattern":"white_list","timest
> >>>>amp":"1436275558570"}
> >>>> (kafka.utils.ZkUtils$)
> >>>> [2015-07-07 14:03:51,616] INFO I wrote this conflicted ephemeral node
> >>>>
> >>>>[{"version":1,"subscription":{"test.*":1},"pattern":"white_list","times
> >>>>tamp":"1436275631510"}]
> >>>> at /consumers/group/ids/test-1435856975563-9a9fdc6c a while back in a
> >>>> different session, hence I will backoff for this node to be deleted by
> >>>> Zookeeper and retry (kafka.utils.ZkUtils$)
> >>>>
> >>>> In the meantime zookeeper reported below exception for the same time
> >>>>span
> >>>>
> >>>> 2015-07-07 22:45:09,687 [myid:3] - INFO  [ProcessThread(sid:3
> >>>> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException
> >>>> when processing sessionid:0x44e657ff19c0019 type:create cxid:0x7a26
> >>>> zxid:0x3015f6e77 txntype:-1 reqpath:n/a Error
> >>>> Path:/consumers/group/ids/test-1435856975563-9a9fdc6c
> >>>>Error:KeeperErrorCode
> >>>> = NodeExists for /consumers/group/ids/test-1435856975563-9a9fdc6c
> >>>>
> >>>> At the end zookeeper timed out the session and consumers triggered
> >>>> rebalance.
> >>>>
> >>>> I know that conflicted ephemeral node warning is to handle a zookeeper
> >>>> bug that session expiration and ephemeral node deletion are not done
> >>>> atomically but as indicated from zookeeper log the zookeeper never
> >>>>got a
> >>>> chance to delete the ephemeral node which made me think that the
> >>>>session
> >>>> was not expired at that time. And for some reason zookeeper fired
> >>>>session
> >>>> expire event which subsequently invoked ZKSessionExpireListener.  I
> >>>>was
> >>>> just wondering if anyone have ever encountered similar issue before
> >>>>and
> >>>> what I can do at zookeeper side to prevent this?
> >>>>
> >>>> Another problem is that createEphemeralPathExpectConflictHandleZKBug
> >>>> call is wrapped in a while(true) loop which runs forever until the
> >>>> ephemeral node is created. Would it be better that we can employ an
> >>>> exponential retry policy with a max number of retries so that it has a
> >>>> chance to re-throw the exception back to caller and let caller handle
> >>>>it in
> >>>> situation like above?
> >>>>
> >>>>
>
>


-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125