You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jaikiran Pai <ja...@gmail.com> on 2015/01/24 05:53:11 UTC

Cannot stop Kafka server if zookeeper is shutdown first

I was just playing around with the RC2 of 0.8.2 and noticed that if I 
shutdown zookeeper first I can't shutdown Kafka server at all since it 
goes into a never ending attempt to reconnect with zookeeper. I had to 
kill the Kafka process to stop it. I tried it against trunk too and 
there too I see the same issue. Should I file a JIRA for this and see if 
I can come up with a patch?

FWIW, here's the unending (and IMO too frequent) attempts at trying to 
reconnect. I've a thread dump too which shows that the other thread 
which is trying to complete a controlled shutdown of Kafka is blocked 
forever for the zookeeper to be up. I can attach it to the JIRA.

2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server null, 
unexpected error, closing socket connection and attempting reconnect 
(org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
     at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
[2015-01-24 10:15:47,437] INFO Opening socket connection to server 
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL 
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server 
null, unexpected error, closing socket connection and attempting 
reconnect (org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
     at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
[2015-01-24 10:15:49,056] INFO Opening socket connection to server 
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL 
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server 
null, unexpected error, closing socket connection and attempting 
reconnect (org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
     at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
[2015-01-24 10:15:50,801] INFO Opening socket connection to server 
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL 
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server 
null, unexpected error, closing socket connection and attempting 
reconnect (org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
     at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)




-Jaikiran

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Harsha <ka...@harsha.io>.
curator handles sasl connection
https://issues.apache.org/jira/browse/KAFKA-1695

On Wed, Feb 4, 2015, at 06:10 AM, Jaikiran Pai wrote:
> FWIW - the ZkClient project team have merged the pull request that I had 
> submitted to allow for timeouts to operations 
> https://github.com/sgroschupf/zkclient/pull/29. I heard from Johannes 
> (from the ZkClient project team) that they don't have any specific 
> release date in mind but are willing to release a new version if/when we 
> need one.
> 
> -Jaikiran
> 
> On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:
> > So I think the current plan is:
> > 1. Add timeout in zkclient
> > 2. Ask zkclient to release new version (we need it for few other things too)
> > 3. Rebase on new zkclient
> > 4. Fix this jira and the few others than were waiting for the new zkclient
> >
> > Does that make sense?
> >
> > Gwen
> >
> > On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <ja...@gmail.com> wrote:
> >> I just heard back from Stefan, who manages the ZkClient repo and he seems to
> >> be open to have these changes be part of ZkClient project. I'll be creating
> >> a pull request for that project to have it reviewed and merged. Although I
> >> haven't heard of exact release plans, Stefan's reply did indicate that the
> >> project could be released after this change is merged.
> >>
> >> -Jaikiran
> >>
> >> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
> >>> Thanks for pointing to that repo!
> >>>
> >>> I just had a look at it and it appears that the project isn't much active
> >>> (going by the lack of activity). The latest contribution is from Gwen and
> >>> that was around 3 months back. I haven't found release plans for that
> >>> project or a place to ask about it (filing an issue doesn't seem right to
> >>> ask this question). So I'll get in touch with the repo owner and see what
> >>> his plans for the project are.
> >>>
> >>> -Jaikiran
> >>>
> >>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
> >>>> I did!
> >>>>
> >>>> Thanks for clarifying :)
> >>>>
> >>>> The client that is part of Zookeeper itself actually does support
> >>>> timeouts.
> >>>>
> >>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wa...@gmail.com> wrote:
> >>>>> Hi Jaikiran,
> >>>>>
> >>>>> I think Gwen was talking about contributing to ZkClient project:
> >>>>>
> >>>>> https://github.com/sgroschupf/zkclient
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>>
> >>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <ja...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Gwen,
> >>>>>>
> >>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
> >>>>>> replacement.
> >>>>>>
> >>>>>> As for contributing to Zookeeper, yes that indeed in on my mind, but I
> >>>>>> haven't yet had a chance to really look deeper into Zookeeper or get in
> >>>>>> touch with their dev team to try and explain this potential improvement
> >>>>>> to
> >>>>>> them. I have no objection to contributing this or something similar to
> >>>>>> Zookeeper directly. I think I should be able to bring this up in the
> >>>>>> Zookeeper dev forum, sometime soon in the next few weekends.
> >>>>>>
> >>>>>> -Jaikiran
> >>>>>>
> >>>>>>
> >>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
> >>>>>>
> >>>>>>> It looks like the new KafkaZkClient is a wrapper around ZkClient, but
> >>>>>>> not a replacement. Did I get it right?
> >>>>>>>
> >>>>>>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664
> >>>>>>> can also use one.
> >>>>>>>
> >>>>>>> However, I'm wondering why not contribute the fix directly to ZKClient
> >>>>>>> project and ask for a release that contains the fix?
> >>>>>>> This will benefit other users of the project who may also need a
> >>>>>>> timeout (thats pretty basic...)
> >>>>>>>
> >>>>>>> As an alternative, if we don't want to collaborate with ZKClient for
> >>>>>>> some reason, forking the project into Kafka will probably give us more
> >>>>>>> control than wrappers and without much downside.
> >>>>>>>
> >>>>>>> Just a thought.
> >>>>>>>
> >>>>>>> Gwen
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
> >>>>>>> <ja...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Neha, Ewen (and others), my initial attempt to solve this is uploaded
> >>>>>>>> here
> >>>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown problem
> >>>>>>>> and
> >>>>>>>> now
> >>>>>>>> the server shuts down even when Zookeeper has gone down before the
> >>>>>>>> Kafka
> >>>>>>>> server.
> >>>>>>>>
> >>>>>>>> I went with the approach of introducing a custom (enhanced) ZkClient
> >>>>>>>> which
> >>>>>>>> for now allows time outs to be optionally specified for certain
> >>>>>>>> operations.
> >>>>>>>> I intentionally haven't forced the use of this new KafkaZkClient all
> >>>>>>>> over
> >>>>>>>> the code and instead for now have just used it in the KafkaServer.
> >>>>>>>>
> >>>>>>>> Does this patch look like something worth using?
> >>>>>>>>
> >>>>>>>> -Jaikiran
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
> >>>>>>>>
> >>>>>>>>> Ewen is right. ZkClient APIs are blocking and the right fix for this
> >>>>>>>>> seems
> >>>>>>>>> to be patching ZkClient. At some point, if we find ourselves
> >>>>>>>>> fiddling
> >>>>>>>>> too
> >>>>>>>>> much with ZkClient, it wouldn't hurt to write our own little
> >>>>>>>>> zookeeper
> >>>>>>>>> client wrapper.
> >>>>>>>>>
> >>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
> >>>>>>>>> <ew...@confluent.io>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>    Looks like a bug to me -- the underlying ZK library wraps a lot of
> >>>>>>>>>> blocking
> >>>>>>>>>> method implementations with waitUntilConnected() calls without any
> >>>>>>>>>> timeouts. Ideally we could just add a version of
> >>>>>>>>>> ZkUtils.getController()
> >>>>>>>>>> with a timeout, but I don't see an easy way to accomplish that with
> >>>>>>>>>> ZkClient.
> >>>>>>>>>>
> >>>>>>>>>> There's at least one other call to ZkUtils besides the one in the
> >>>>>>>>>> stacktrace you gave that would cause the same issue, possibly more
> >>>>>>>>>> that
> >>>>>>>>>> aren't directly called in that method. One ugly solution would be
> >>>>>>>>>> to
> >>>>>>>>>> use
> >>>>>>>>>> an
> >>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine
> >>>>>>>>>> we
> >>>>>>>>>> probably have other threads that could end up blocking in similar
> >>>>>>>>>> ways.
> >>>>>>>>>>
> >>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track
> >>>>>>>>>> the
> >>>>>>>>>> issue.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
> >>>>>>>>>> jai.forums2013@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>    The main culprit is this thread which goes into "forever retry
> >>>>>>>>>>> connection
> >>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C)
> >>>>>>>>>>> after
> >>>>>>>>>>> zookeeper has already been shutdown. I have attached the complete
> >>>>>>>>>>> thread
> >>>>>>>>>>> dump, but I don't know if it will be delivered to the mailing
> >>>>>>>>>>> list.
> >>>>>>>>>>>
> >>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
> >>>>>>>>>>> [0x6ad69000]
> >>>>>>>>>>>        java.lang.Thread.State: TIMED_WAITING (parking)
> >>>>>>>>>>>         at sun.misc.Unsafe.park(Native Method)
> >>>>>>>>>>>         - parking to wait for  <0x70a93368> (a
> >>>>>>>>>>> java.util.concurrent.locks.
> >>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
> >>>>>>>>>>>         at java.util.concurrent.locks.LockSupport.parkUntil(
> >>>>>>>>>>> LockSupport.java:267)
> >>>>>>>>>>>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$
> >>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
> >>>>>>>>>>>         at
> >>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
> >>>>>>>>>>>         at
> >>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
> >>>>>>>>>>>         at
> >>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
> >>>>>>>>>>>         at
> >>>>>>>>>>>
> >>>>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
> >>>>>>>>>>
> >>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> >>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> >>>>>>>>>>>         at
> >>>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
> >>>>>>>>>>>         at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
> >>>>>>>>>>>         at kafka.server.KafkaServer.kafka$server$KafkaServer$$
> >>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
> >>>>>>>>>>>         at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
> >>>>>>>>>>> sp(KafkaServer.scala:269)
> >>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:172)
> >>>>>>>>>>>         at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
> >>>>>>>>>>>         at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
> >>>>>>>>>>>         at kafka.utils.Logging$class.swallow(Logging.scala:94)
> >>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:45)
> >>>>>>>>>>>         at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
> >>>>>>>>>>>         at kafka.server.KafkaServerStartable.shutdown(
> >>>>>>>>>>> KafkaServerStartable.scala:42)
> >>>>>>>>>>>         at kafka.Kafka$$anon$1.run(Kafka.scala:42)
> >>>>>>>>>>>
> >>>>>>>>>>> -Jaikiran
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>    For a clean shutdown, the broker tries to talk to the controller
> >>>>>>>>>>> and
> >>>>>>>>>>> also
> >>>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
> >>>>>>>>>>>> reconnect
> >>>>>>>>>>>>
> >>>>>>>>>>> to
> >>>>>>>>>>> zk. It will help to look at the thread dump.
> >>>>>>>>>>>> Thanks
> >>>>>>>>>>>> Neha
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
> >>>>>>>>>>>> jai.forums2013@gmail.com
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>      I was just playing around with the RC2 of 0.8.2 and noticed
> >>>>>>>>>>>> that
> >>>>>>>>>>>> if I
> >>>>>>>>>>>>
> >>>>>>>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all
> >>>>>>>>>>>>> since
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>> goes
> >>>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>> kill
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and there
> >>>>>>>>>>>>> too I
> >>>>>>>>>>>>> see
> >>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if I can
> >>>>>>>>>>>>> come
> >>>>>>>>>>>>> up
> >>>>>>>>>>>>> with
> >>>>>>>>>>>>> a patch?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at
> >>>>>>>>>>>>> trying
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the other
> >>>>>>>>>>>>> thread
> >>>>>>>>>>>>>
> >>>>>>>>>>>> which
> >>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
> >>>>>>>>>>>>> forever
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>> server
> >>>>>>>>>>>>>
> >>>>>>>>>>>> null,
> >>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>> reconnect
> >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> >>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to
> >>>>>>>>>>>>> server
> >>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
> >>>>>>>>>>>>> SASL
> >>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>> server
> >>>>>>>>>>>>>
> >>>>>>>>>>>> null,
> >>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>> reconnect
> >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> >>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to
> >>>>>>>>>>>>> server
> >>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
> >>>>>>>>>>>>> SASL
> >>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>> server
> >>>>>>>>>>>>>
> >>>>>>>>>>>> null,
> >>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>> reconnect
> >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> >>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to
> >>>>>>>>>>>>> server
> >>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
> >>>>>>>>>>>>> SASL
> >>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>> server
> >>>>>>>>>>>>>
> >>>>>>>>>>>> null,
> >>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>> reconnect
> >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> >>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Jaikiran
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>    --
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Ewen
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>> --
> >>>>> -- Guozhang
> >>>
> 

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Joe Stein <jo...@stealth.ly>.
<< A more abstract interface to the distributed coordination service that
could be configured to use alternatives like consul or etcd would be very
useful imho.

+1

If we make the curator changes lets take the time to first build an
interface please we can expose both meta data and async notification
services through it. I have been playing around with etcd 2.0 this last
week since it has been out. I really like some of the concepts like
waitIndex (so you don't miss watches) and the compare and set/delete stuff
is great too. The ectd 2.0 rest call returns and when it does we just have
to fire the same event back into the API like the zookeeper code will do.
Coming up with what that looks like is a KIP as would be changing and using
Curator
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
for those changes. I think as long as the project has two we know it works
for different groups of folks.

I think whoever might be interested in working on that should chime in,
write up the KIP and lets start getting that discussion / release planning
/ work moving ahead.

- Joe Stein

On Wed, Feb 4, 2015 at 11:27 AM, Dana Powers <da...@rd.io> wrote:

> While on the subject of zkclient, also consider KAFKA-1793.  A more
> abstract interface to the distributed coordination service that could be
> configured to use alternatives like consul or etcd would be very useful
> imho.
>
> Dana
> FWIW - the ZkClient project team have merged the pull request that I had
> submitted to allow for timeouts to operations
> https://github.com/sgroschupf/
> zkclient/pull/29. I heard from Johannes (from the ZkClient project team)
> that they don't have any specific release date in mind but are willing to
> release a new version if/when we need one.
>
> -Jaikiran
>
> On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:
>
> > So I think the current plan is:
> > 1. Add timeout in zkclient
> > 2. Ask zkclient to release new version (we need it for few other things
> > too)
> > 3. Rebase on new zkclient
> > 4. Fix this jira and the few others than were waiting for the new
> zkclient
> >
> > Does that make sense?
> >
> > Gwen
> >
> > On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <ja...@gmail.com>
> > wrote:
> >
> >> I just heard back from Stefan, who manages the ZkClient repo and he
> seems
> >> to
> >> be open to have these changes be part of ZkClient project. I'll be
> >> creating
> >> a pull request for that project to have it reviewed and merged.
> Although I
> >> haven't heard of exact release plans, Stefan's reply did indicate that
> the
> >> project could be released after this change is merged.
> >>
> >> -Jaikiran
> >>
> >> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
> >>
> >>> Thanks for pointing to that repo!
> >>>
> >>> I just had a look at it and it appears that the project isn't much
> active
> >>> (going by the lack of activity). The latest contribution is from Gwen
> and
> >>> that was around 3 months back. I haven't found release plans for that
> >>> project or a place to ask about it (filing an issue doesn't seem right
> to
> >>> ask this question). So I'll get in touch with the repo owner and see
> what
> >>> his plans for the project are.
> >>>
> >>> -Jaikiran
> >>>
> >>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
> >>>
> >>>> I did!
> >>>>
> >>>> Thanks for clarifying :)
> >>>>
> >>>> The client that is part of Zookeeper itself actually does support
> >>>> timeouts.
> >>>>
> >>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wa...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hi Jaikiran,
> >>>>>
> >>>>> I think Gwen was talking about contributing to ZkClient project:
> >>>>>
> >>>>> https://github.com/sgroschupf/zkclient
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>>
> >>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <
> jai.forums2013@gmail.com
> >>>>> >
> >>>>> wrote:
> >>>>>
> >>>>>  Hi Gwen,
> >>>>>>
> >>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a
> complete
> >>>>>> replacement.
> >>>>>>
> >>>>>> As for contributing to Zookeeper, yes that indeed in on my mind,
> but I
> >>>>>> haven't yet had a chance to really look deeper into Zookeeper or get
> >>>>>> in
> >>>>>> touch with their dev team to try and explain this potential
> >>>>>> improvement
> >>>>>> to
> >>>>>> them. I have no objection to contributing this or something similar
> to
> >>>>>> Zookeeper directly. I think I should be able to bring this up in the
> >>>>>> Zookeeper dev forum, sometime soon in the next few weekends.
> >>>>>>
> >>>>>> -Jaikiran
> >>>>>>
> >>>>>>
> >>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
> >>>>>>
> >>>>>>  It looks like the new KafkaZkClient is a wrapper around ZkClient,
> but
> >>>>>>> not a replacement. Did I get it right?
> >>>>>>>
> >>>>>>> I think a wrapper for ZkClient can be useful - for example
> KAFKA-1664
> >>>>>>> can also use one.
> >>>>>>>
> >>>>>>> However, I'm wondering why not contribute the fix directly to
> >>>>>>> ZKClient
> >>>>>>> project and ask for a release that contains the fix?
> >>>>>>> This will benefit other users of the project who may also need a
> >>>>>>> timeout (thats pretty basic...)
> >>>>>>>
> >>>>>>> As an alternative, if we don't want to collaborate with ZKClient
> for
> >>>>>>> some reason, forking the project into Kafka will probably give us
> >>>>>>> more
> >>>>>>> control than wrappers and without much downside.
> >>>>>>>
> >>>>>>> Just a thought.
> >>>>>>>
> >>>>>>> Gwen
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
> >>>>>>> <ja...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>  Neha, Ewen (and others), my initial attempt to solve this is
> >>>>>>>> uploaded
> >>>>>>>> here
> >>>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown
> problem
> >>>>>>>> and
> >>>>>>>> now
> >>>>>>>> the server shuts down even when Zookeeper has gone down before the
> >>>>>>>> Kafka
> >>>>>>>> server.
> >>>>>>>>
> >>>>>>>> I went with the approach of introducing a custom (enhanced)
> ZkClient
> >>>>>>>> which
> >>>>>>>> for now allows time outs to be optionally specified for certain
> >>>>>>>> operations.
> >>>>>>>> I intentionally haven't forced the use of this new KafkaZkClient
> all
> >>>>>>>> over
> >>>>>>>> the code and instead for now have just used it in the KafkaServer.
> >>>>>>>>
> >>>>>>>> Does this patch look like something worth using?
> >>>>>>>>
> >>>>>>>> -Jaikiran
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
> >>>>>>>>
> >>>>>>>>  Ewen is right. ZkClient APIs are blocking and the right fix for
> >>>>>>>>> this
> >>>>>>>>> seems
> >>>>>>>>> to be patching ZkClient. At some point, if we find ourselves
> >>>>>>>>> fiddling
> >>>>>>>>> too
> >>>>>>>>> much with ZkClient, it wouldn't hurt to write our own little
> >>>>>>>>> zookeeper
> >>>>>>>>> client wrapper.
> >>>>>>>>>
> >>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
> >>>>>>>>> <ew...@confluent.io>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>    Looks like a bug to me -- the underlying ZK library wraps a
> lot
> >>>>>>>>> of
> >>>>>>>>>
> >>>>>>>>>> blocking
> >>>>>>>>>> method implementations with waitUntilConnected() calls without
> any
> >>>>>>>>>> timeouts. Ideally we could just add a version of
> >>>>>>>>>> ZkUtils.getController()
> >>>>>>>>>> with a timeout, but I don't see an easy way to accomplish that
> >>>>>>>>>> with
> >>>>>>>>>> ZkClient.
> >>>>>>>>>>
> >>>>>>>>>> There's at least one other call to ZkUtils besides the one in
> the
> >>>>>>>>>> stacktrace you gave that would cause the same issue, possibly
> more
> >>>>>>>>>> that
> >>>>>>>>>> aren't directly called in that method. One ugly solution would
> be
> >>>>>>>>>> to
> >>>>>>>>>> use
> >>>>>>>>>> an
> >>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd
> imagine
> >>>>>>>>>> we
> >>>>>>>>>> probably have other threads that could end up blocking in
> similar
> >>>>>>>>>> ways.
> >>>>>>>>>>
> >>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to
> track
> >>>>>>>>>> the
> >>>>>>>>>> issue.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
> >>>>>>>>>> jai.forums2013@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>    The main culprit is this thread which goes into "forever
> retry
> >>>>>>>>>>
> >>>>>>>>>>> connection
> >>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C)
> >>>>>>>>>>> after
> >>>>>>>>>>> zookeeper has already been shutdown. I have attached the
> complete
> >>>>>>>>>>> thread
> >>>>>>>>>>> dump, but I don't know if it will be delivered to the mailing
> >>>>>>>>>>> list.
> >>>>>>>>>>>
> >>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on
> condition
> >>>>>>>>>>> [0x6ad69000]
> >>>>>>>>>>>        java.lang.Thread.State: TIMED_WAITING (parking)
> >>>>>>>>>>>         at sun.misc.Unsafe.park(Native Method)
> >>>>>>>>>>>         - parking to wait for  <0x70a93368> (a
> >>>>>>>>>>> java.util.concurrent.locks.
> >>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
> >>>>>>>>>>>         at java.util.concurrent.locks.LockSupport.parkUntil(
> >>>>>>>>>>> LockSupport.java:267)
> >>>>>>>>>>>         at java.util.concurrent.locks.
> >>>>>>>>>>> AbstractQueuedSynchronizer$
> >>>>>>>>>>>
> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
> >>>>>>>>>>>         at
> >>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.
> >>>>>>>>>>> java:636)
> >>>>>>>>>>>         at
> >>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
> >>>>>>>>>>> java:619)
> >>>>>>>>>>>         at
> >>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
> >>>>>>>>>>> java:615)
> >>>>>>>>>>>         at
> >>>>>>>>>>>
> >>>>>>>>>>>  org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
> >>>>>>>>>> java:679)
> >>>>>>>>>>
> >>>>>>>>>>          at org.I0Itec.zkclient.ZkClient.
> >>>>>>>>>>> readData(ZkClient.java:766)
> >>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.
> >>>>>>>>>>> readData(ZkClient.java:761)
> >>>>>>>>>>>         at
> >>>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
> >>>>>>>>>>>         at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
> >>>>>>>>>>>         at kafka.server.KafkaServer.kafka$server$KafkaServer$$
> >>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
> >>>>>>>>>>>         at kafka.server.KafkaServer$$
> >>>>>>>>>>> anonfun$shutdown$1.apply$mcV$
> >>>>>>>>>>> sp(KafkaServer.scala:269)
> >>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:172)
> >>>>>>>>>>>         at kafka.utils.Logging$class.
> >>>>>>>>>>> swallowWarn(Logging.scala:92)
> >>>>>>>>>>>         at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
> >>>>>>>>>>>         at kafka.utils.Logging$class.swallow(Logging.scala:94)
> >>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:45)
> >>>>>>>>>>>         at kafka.server.KafkaServer.shutdown(KafkaServer.scala:
> >>>>>>>>>>> 269)
> >>>>>>>>>>>         at kafka.server.KafkaServerStartable.shutdown(
> >>>>>>>>>>> KafkaServerStartable.scala:42)
> >>>>>>>>>>>         at kafka.Kafka$$anon$1.run(Kafka.scala:42)
> >>>>>>>>>>>
> >>>>>>>>>>> -Jaikiran
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>    For a clean shutdown, the broker tries to talk to the
> >>>>>>>>>>> controller
> >>>>>>>>>>> and
> >>>>>>>>>>> also
> >>>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
> >>>>>>>>>>>
> >>>>>>>>>>>> reconnect
> >>>>>>>>>>>>
> >>>>>>>>>>>>  to
> >>>>>>>>>>> zk. It will help to look at the thread dump.
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks
> >>>>>>>>>>>> Neha
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
> >>>>>>>>>>>> jai.forums2013@gmail.com
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>      I was just playing around with the RC2 of 0.8.2 and
> noticed
> >>>>>>>>>>>> that
> >>>>>>>>>>>> if I
> >>>>>>>>>>>>
> >>>>>>>>>>>>  shutdown zookeeper first I can't shutdown Kafka server at all
> >>>>>>>>>>>>> since
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>> goes
> >>>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I
> had
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>> kill
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and
> >>>>>>>>>>>>> there
> >>>>>>>>>>>>> too I
> >>>>>>>>>>>>> see
> >>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if I
> can
> >>>>>>>>>>>>> come
> >>>>>>>>>>>>> up
> >>>>>>>>>>>>> with
> >>>>>>>>>>>>> a patch?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at
> >>>>>>>>>>>>> trying
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the other
> >>>>>>>>>>>>> thread
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  which
> >>>>>>>>>>>>
> >>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
> >>>>>>>>>>>
> >>>>>>>>>>>> forever
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>> server
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  null,
> >>>>>>>>>>>>
> >>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>> reconnect
> >>>>>>>>>>>
> >>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> >>>>>>>>>>>>> doTransport(
> >>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to
> >>>>>>>>>>>>> server
> >>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
> >>>>>>>>>>>>> using
> >>>>>>>>>>>>> SASL
> >>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>> server
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  null,
> >>>>>>>>>>>>
> >>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>> reconnect
> >>>>>>>>>>>
> >>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> >>>>>>>>>>>>> doTransport(
> >>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to
> >>>>>>>>>>>>> server
> >>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
> >>>>>>>>>>>>> using
> >>>>>>>>>>>>> SASL
> >>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>> server
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  null,
> >>>>>>>>>>>>
> >>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>> reconnect
> >>>>>>>>>>>
> >>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> >>>>>>>>>>>>> doTransport(
> >>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to
> >>>>>>>>>>>>> server
> >>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
> >>>>>>>>>>>>> using
> >>>>>>>>>>>>> SASL
> >>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>> server
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  null,
> >>>>>>>>>>>>
> >>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>> reconnect
> >>>>>>>>>>>
> >>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> >>>>>>>>>>>>> doTransport(
> >>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Jaikiran
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>    --
> >>>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>> Ewen
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  --
> >>>>> -- Guozhang
> >>>>>
> >>>>
> >>>
>

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Dana Powers <da...@rd.io>.
While on the subject of zkclient, also consider KAFKA-1793.  A more
abstract interface to the distributed coordination service that could be
configured to use alternatives like consul or etcd would be very useful
imho.

Dana
FWIW - the ZkClient project team have merged the pull request that I had
submitted to allow for timeouts to operations https://github.com/sgroschupf/
zkclient/pull/29. I heard from Johannes (from the ZkClient project team)
that they don't have any specific release date in mind but are willing to
release a new version if/when we need one.

-Jaikiran

On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:

> So I think the current plan is:
> 1. Add timeout in zkclient
> 2. Ask zkclient to release new version (we need it for few other things
> too)
> 3. Rebase on new zkclient
> 4. Fix this jira and the few others than were waiting for the new zkclient
>
> Does that make sense?
>
> Gwen
>
> On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <ja...@gmail.com>
> wrote:
>
>> I just heard back from Stefan, who manages the ZkClient repo and he seems
>> to
>> be open to have these changes be part of ZkClient project. I'll be
>> creating
>> a pull request for that project to have it reviewed and merged. Although I
>> haven't heard of exact release plans, Stefan's reply did indicate that the
>> project could be released after this change is merged.
>>
>> -Jaikiran
>>
>> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
>>
>>> Thanks for pointing to that repo!
>>>
>>> I just had a look at it and it appears that the project isn't much active
>>> (going by the lack of activity). The latest contribution is from Gwen and
>>> that was around 3 months back. I haven't found release plans for that
>>> project or a place to ask about it (filing an issue doesn't seem right to
>>> ask this question). So I'll get in touch with the repo owner and see what
>>> his plans for the project are.
>>>
>>> -Jaikiran
>>>
>>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
>>>
>>>> I did!
>>>>
>>>> Thanks for clarifying :)
>>>>
>>>> The client that is part of Zookeeper itself actually does support
>>>> timeouts.
>>>>
>>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Jaikiran,
>>>>>
>>>>> I think Gwen was talking about contributing to ZkClient project:
>>>>>
>>>>> https://github.com/sgroschupf/zkclient
>>>>>
>>>>> Guozhang
>>>>>
>>>>>
>>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <jai.forums2013@gmail.com
>>>>> >
>>>>> wrote:
>>>>>
>>>>>  Hi Gwen,
>>>>>>
>>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
>>>>>> replacement.
>>>>>>
>>>>>> As for contributing to Zookeeper, yes that indeed in on my mind, but I
>>>>>> haven't yet had a chance to really look deeper into Zookeeper or get
>>>>>> in
>>>>>> touch with their dev team to try and explain this potential
>>>>>> improvement
>>>>>> to
>>>>>> them. I have no objection to contributing this or something similar to
>>>>>> Zookeeper directly. I think I should be able to bring this up in the
>>>>>> Zookeeper dev forum, sometime soon in the next few weekends.
>>>>>>
>>>>>> -Jaikiran
>>>>>>
>>>>>>
>>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>>>>>>
>>>>>>  It looks like the new KafkaZkClient is a wrapper around ZkClient, but
>>>>>>> not a replacement. Did I get it right?
>>>>>>>
>>>>>>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664
>>>>>>> can also use one.
>>>>>>>
>>>>>>> However, I'm wondering why not contribute the fix directly to
>>>>>>> ZKClient
>>>>>>> project and ask for a release that contains the fix?
>>>>>>> This will benefit other users of the project who may also need a
>>>>>>> timeout (thats pretty basic...)
>>>>>>>
>>>>>>> As an alternative, if we don't want to collaborate with ZKClient for
>>>>>>> some reason, forking the project into Kafka will probably give us
>>>>>>> more
>>>>>>> control than wrappers and without much downside.
>>>>>>>
>>>>>>> Just a thought.
>>>>>>>
>>>>>>> Gwen
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
>>>>>>> <ja...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>  Neha, Ewen (and others), my initial attempt to solve this is
>>>>>>>> uploaded
>>>>>>>> here
>>>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown problem
>>>>>>>> and
>>>>>>>> now
>>>>>>>> the server shuts down even when Zookeeper has gone down before the
>>>>>>>> Kafka
>>>>>>>> server.
>>>>>>>>
>>>>>>>> I went with the approach of introducing a custom (enhanced) ZkClient
>>>>>>>> which
>>>>>>>> for now allows time outs to be optionally specified for certain
>>>>>>>> operations.
>>>>>>>> I intentionally haven't forced the use of this new KafkaZkClient all
>>>>>>>> over
>>>>>>>> the code and instead for now have just used it in the KafkaServer.
>>>>>>>>
>>>>>>>> Does this patch look like something worth using?
>>>>>>>>
>>>>>>>> -Jaikiran
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>>>>>>>
>>>>>>>>  Ewen is right. ZkClient APIs are blocking and the right fix for
>>>>>>>>> this
>>>>>>>>> seems
>>>>>>>>> to be patching ZkClient. At some point, if we find ourselves
>>>>>>>>> fiddling
>>>>>>>>> too
>>>>>>>>> much with ZkClient, it wouldn't hurt to write our own little
>>>>>>>>> zookeeper
>>>>>>>>> client wrapper.
>>>>>>>>>
>>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>>>>>>> <ew...@confluent.io>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>    Looks like a bug to me -- the underlying ZK library wraps a lot
>>>>>>>>> of
>>>>>>>>>
>>>>>>>>>> blocking
>>>>>>>>>> method implementations with waitUntilConnected() calls without any
>>>>>>>>>> timeouts. Ideally we could just add a version of
>>>>>>>>>> ZkUtils.getController()
>>>>>>>>>> with a timeout, but I don't see an easy way to accomplish that
>>>>>>>>>> with
>>>>>>>>>> ZkClient.
>>>>>>>>>>
>>>>>>>>>> There's at least one other call to ZkUtils besides the one in the
>>>>>>>>>> stacktrace you gave that would cause the same issue, possibly more
>>>>>>>>>> that
>>>>>>>>>> aren't directly called in that method. One ugly solution would be
>>>>>>>>>> to
>>>>>>>>>> use
>>>>>>>>>> an
>>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine
>>>>>>>>>> we
>>>>>>>>>> probably have other threads that could end up blocking in similar
>>>>>>>>>> ways.
>>>>>>>>>>
>>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track
>>>>>>>>>> the
>>>>>>>>>> issue.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>>>>>>> jai.forums2013@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>    The main culprit is this thread which goes into "forever retry
>>>>>>>>>>
>>>>>>>>>>> connection
>>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C)
>>>>>>>>>>> after
>>>>>>>>>>> zookeeper has already been shutdown. I have attached the complete
>>>>>>>>>>> thread
>>>>>>>>>>> dump, but I don't know if it will be delivered to the mailing
>>>>>>>>>>> list.
>>>>>>>>>>>
>>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>>>>>>>>> [0x6ad69000]
>>>>>>>>>>>        java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>>>>>>         at sun.misc.Unsafe.park(Native Method)
>>>>>>>>>>>         - parking to wait for  <0x70a93368> (a
>>>>>>>>>>> java.util.concurrent.locks.
>>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>>>>>>         at java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>>>>>>> LockSupport.java:267)
>>>>>>>>>>>         at java.util.concurrent.locks.
>>>>>>>>>>> AbstractQueuedSynchronizer$
>>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>>>>>>>>         at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.
>>>>>>>>>>> java:636)
>>>>>>>>>>>         at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
>>>>>>>>>>> java:619)
>>>>>>>>>>>         at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
>>>>>>>>>>> java:615)
>>>>>>>>>>>         at
>>>>>>>>>>>
>>>>>>>>>>>  org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
>>>>>>>>>> java:679)
>>>>>>>>>>
>>>>>>>>>>          at org.I0Itec.zkclient.ZkClient.
>>>>>>>>>>> readData(ZkClient.java:766)
>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.
>>>>>>>>>>> readData(ZkClient.java:761)
>>>>>>>>>>>         at
>>>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>>>>>>>         at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>>>>>>         at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>>>>>>         at kafka.server.KafkaServer$$
>>>>>>>>>>> anonfun$shutdown$1.apply$mcV$
>>>>>>>>>>> sp(KafkaServer.scala:269)
>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>>>>>>         at kafka.utils.Logging$class.
>>>>>>>>>>> swallowWarn(Logging.scala:92)
>>>>>>>>>>>         at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>>>>>>>         at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>>>>>>         at kafka.server.KafkaServer.shutdown(KafkaServer.scala:
>>>>>>>>>>> 269)
>>>>>>>>>>>         at kafka.server.KafkaServerStartable.shutdown(
>>>>>>>>>>> KafkaServerStartable.scala:42)
>>>>>>>>>>>         at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>>>>>>
>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>>>>>>>>
>>>>>>>>>>>    For a clean shutdown, the broker tries to talk to the
>>>>>>>>>>> controller
>>>>>>>>>>> and
>>>>>>>>>>> also
>>>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
>>>>>>>>>>>
>>>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>>>  to
>>>>>>>>>>> zk. It will help to look at the thread dump.
>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Neha
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
>>>>>>>>>>>> jai.forums2013@gmail.com
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>      I was just playing around with the RC2 of 0.8.2 and noticed
>>>>>>>>>>>> that
>>>>>>>>>>>> if I
>>>>>>>>>>>>
>>>>>>>>>>>>  shutdown zookeeper first I can't shutdown Kafka server at all
>>>>>>>>>>>>> since
>>>>>>>>>>>>> it
>>>>>>>>>>>>> goes
>>>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had
>>>>>>>>>>>>> to
>>>>>>>>>>>>> kill
>>>>>>>>>>>>> the
>>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and
>>>>>>>>>>>>> there
>>>>>>>>>>>>> too I
>>>>>>>>>>>>> see
>>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if I can
>>>>>>>>>>>>> come
>>>>>>>>>>>>> up
>>>>>>>>>>>>> with
>>>>>>>>>>>>> a patch?
>>>>>>>>>>>>>
>>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at
>>>>>>>>>>>>> trying
>>>>>>>>>>>>> to
>>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the other
>>>>>>>>>>>>> thread
>>>>>>>>>>>>>
>>>>>>>>>>>>>  which
>>>>>>>>>>>>
>>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>>>>>>>
>>>>>>>>>>>> forever
>>>>>>>>>>>>> for
>>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>>>  null,
>>>>>>>>>>>>
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to
>>>>>>>>>>>>> server
>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
>>>>>>>>>>>>> using
>>>>>>>>>>>>> SASL
>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>>>  null,
>>>>>>>>>>>>
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to
>>>>>>>>>>>>> server
>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
>>>>>>>>>>>>> using
>>>>>>>>>>>>> SASL
>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>>>  null,
>>>>>>>>>>>>
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to
>>>>>>>>>>>>> server
>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
>>>>>>>>>>>>> using
>>>>>>>>>>>>> SASL
>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>>>  null,
>>>>>>>>>>>>
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    --
>>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>> Ewen
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  --
>>>>> -- Guozhang
>>>>>
>>>>
>>>

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Jaikiran Pai <ja...@gmail.com>.
FWIW - the ZkClient project team have merged the pull request that I had 
submitted to allow for timeouts to operations 
https://github.com/sgroschupf/zkclient/pull/29. I heard from Johannes 
(from the ZkClient project team) that they don't have any specific 
release date in mind but are willing to release a new version if/when we 
need one.

-Jaikiran

On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:
> So I think the current plan is:
> 1. Add timeout in zkclient
> 2. Ask zkclient to release new version (we need it for few other things too)
> 3. Rebase on new zkclient
> 4. Fix this jira and the few others than were waiting for the new zkclient
>
> Does that make sense?
>
> Gwen
>
> On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <ja...@gmail.com> wrote:
>> I just heard back from Stefan, who manages the ZkClient repo and he seems to
>> be open to have these changes be part of ZkClient project. I'll be creating
>> a pull request for that project to have it reviewed and merged. Although I
>> haven't heard of exact release plans, Stefan's reply did indicate that the
>> project could be released after this change is merged.
>>
>> -Jaikiran
>>
>> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
>>> Thanks for pointing to that repo!
>>>
>>> I just had a look at it and it appears that the project isn't much active
>>> (going by the lack of activity). The latest contribution is from Gwen and
>>> that was around 3 months back. I haven't found release plans for that
>>> project or a place to ask about it (filing an issue doesn't seem right to
>>> ask this question). So I'll get in touch with the repo owner and see what
>>> his plans for the project are.
>>>
>>> -Jaikiran
>>>
>>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
>>>> I did!
>>>>
>>>> Thanks for clarifying :)
>>>>
>>>> The client that is part of Zookeeper itself actually does support
>>>> timeouts.
>>>>
>>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wa...@gmail.com> wrote:
>>>>> Hi Jaikiran,
>>>>>
>>>>> I think Gwen was talking about contributing to ZkClient project:
>>>>>
>>>>> https://github.com/sgroschupf/zkclient
>>>>>
>>>>> Guozhang
>>>>>
>>>>>
>>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <ja...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Gwen,
>>>>>>
>>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
>>>>>> replacement.
>>>>>>
>>>>>> As for contributing to Zookeeper, yes that indeed in on my mind, but I
>>>>>> haven't yet had a chance to really look deeper into Zookeeper or get in
>>>>>> touch with their dev team to try and explain this potential improvement
>>>>>> to
>>>>>> them. I have no objection to contributing this or something similar to
>>>>>> Zookeeper directly. I think I should be able to bring this up in the
>>>>>> Zookeeper dev forum, sometime soon in the next few weekends.
>>>>>>
>>>>>> -Jaikiran
>>>>>>
>>>>>>
>>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>>>>>>
>>>>>>> It looks like the new KafkaZkClient is a wrapper around ZkClient, but
>>>>>>> not a replacement. Did I get it right?
>>>>>>>
>>>>>>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664
>>>>>>> can also use one.
>>>>>>>
>>>>>>> However, I'm wondering why not contribute the fix directly to ZKClient
>>>>>>> project and ask for a release that contains the fix?
>>>>>>> This will benefit other users of the project who may also need a
>>>>>>> timeout (thats pretty basic...)
>>>>>>>
>>>>>>> As an alternative, if we don't want to collaborate with ZKClient for
>>>>>>> some reason, forking the project into Kafka will probably give us more
>>>>>>> control than wrappers and without much downside.
>>>>>>>
>>>>>>> Just a thought.
>>>>>>>
>>>>>>> Gwen
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
>>>>>>> <ja...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Neha, Ewen (and others), my initial attempt to solve this is uploaded
>>>>>>>> here
>>>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown problem
>>>>>>>> and
>>>>>>>> now
>>>>>>>> the server shuts down even when Zookeeper has gone down before the
>>>>>>>> Kafka
>>>>>>>> server.
>>>>>>>>
>>>>>>>> I went with the approach of introducing a custom (enhanced) ZkClient
>>>>>>>> which
>>>>>>>> for now allows time outs to be optionally specified for certain
>>>>>>>> operations.
>>>>>>>> I intentionally haven't forced the use of this new KafkaZkClient all
>>>>>>>> over
>>>>>>>> the code and instead for now have just used it in the KafkaServer.
>>>>>>>>
>>>>>>>> Does this patch look like something worth using?
>>>>>>>>
>>>>>>>> -Jaikiran
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>>>>>>>
>>>>>>>>> Ewen is right. ZkClient APIs are blocking and the right fix for this
>>>>>>>>> seems
>>>>>>>>> to be patching ZkClient. At some point, if we find ourselves
>>>>>>>>> fiddling
>>>>>>>>> too
>>>>>>>>> much with ZkClient, it wouldn't hurt to write our own little
>>>>>>>>> zookeeper
>>>>>>>>> client wrapper.
>>>>>>>>>
>>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>>>>>>> <ew...@confluent.io>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>    Looks like a bug to me -- the underlying ZK library wraps a lot of
>>>>>>>>>> blocking
>>>>>>>>>> method implementations with waitUntilConnected() calls without any
>>>>>>>>>> timeouts. Ideally we could just add a version of
>>>>>>>>>> ZkUtils.getController()
>>>>>>>>>> with a timeout, but I don't see an easy way to accomplish that with
>>>>>>>>>> ZkClient.
>>>>>>>>>>
>>>>>>>>>> There's at least one other call to ZkUtils besides the one in the
>>>>>>>>>> stacktrace you gave that would cause the same issue, possibly more
>>>>>>>>>> that
>>>>>>>>>> aren't directly called in that method. One ugly solution would be
>>>>>>>>>> to
>>>>>>>>>> use
>>>>>>>>>> an
>>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine
>>>>>>>>>> we
>>>>>>>>>> probably have other threads that could end up blocking in similar
>>>>>>>>>> ways.
>>>>>>>>>>
>>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track
>>>>>>>>>> the
>>>>>>>>>> issue.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>>>>>>> jai.forums2013@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>    The main culprit is this thread which goes into "forever retry
>>>>>>>>>>> connection
>>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C)
>>>>>>>>>>> after
>>>>>>>>>>> zookeeper has already been shutdown. I have attached the complete
>>>>>>>>>>> thread
>>>>>>>>>>> dump, but I don't know if it will be delivered to the mailing
>>>>>>>>>>> list.
>>>>>>>>>>>
>>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>>>>>>>>> [0x6ad69000]
>>>>>>>>>>>        java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>>>>>>         at sun.misc.Unsafe.park(Native Method)
>>>>>>>>>>>         - parking to wait for  <0x70a93368> (a
>>>>>>>>>>> java.util.concurrent.locks.
>>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>>>>>>         at java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>>>>>>> LockSupport.java:267)
>>>>>>>>>>>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>>>>>>>>         at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>>>>>>>>>>>         at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>>>>>>>>>>>         at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>>>>>>>>>>>         at
>>>>>>>>>>>
>>>>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>>>>>>>>>>
>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>>>>>>>>>         at
>>>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>>>>>>>         at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>>>>>>         at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>>>>>>         at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
>>>>>>>>>>> sp(KafkaServer.scala:269)
>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>>>>>>         at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>>>>>>>>>>>         at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>>>>>>>         at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>>>>>>         at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>>>>>>>>>>>         at kafka.server.KafkaServerStartable.shutdown(
>>>>>>>>>>> KafkaServerStartable.scala:42)
>>>>>>>>>>>         at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>>>>>>
>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>>>>>>>>
>>>>>>>>>>>    For a clean shutdown, the broker tries to talk to the controller
>>>>>>>>>>> and
>>>>>>>>>>> also
>>>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
>>>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>> zk. It will help to look at the thread dump.
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Neha
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
>>>>>>>>>>>> jai.forums2013@gmail.com
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>      I was just playing around with the RC2 of 0.8.2 and noticed
>>>>>>>>>>>> that
>>>>>>>>>>>> if I
>>>>>>>>>>>>
>>>>>>>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all
>>>>>>>>>>>>> since
>>>>>>>>>>>>> it
>>>>>>>>>>>>> goes
>>>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had
>>>>>>>>>>>>> to
>>>>>>>>>>>>> kill
>>>>>>>>>>>>> the
>>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and there
>>>>>>>>>>>>> too I
>>>>>>>>>>>>> see
>>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if I can
>>>>>>>>>>>>> come
>>>>>>>>>>>>> up
>>>>>>>>>>>>> with
>>>>>>>>>>>>> a patch?
>>>>>>>>>>>>>
>>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at
>>>>>>>>>>>>> trying
>>>>>>>>>>>>> to
>>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the other
>>>>>>>>>>>>> thread
>>>>>>>>>>>>>
>>>>>>>>>>>> which
>>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>>>>>>>>> forever
>>>>>>>>>>>>> for
>>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>> null,
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to
>>>>>>>>>>>>> server
>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>>>>> SASL
>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>> null,
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to
>>>>>>>>>>>>> server
>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>>>>> SASL
>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>> null,
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to
>>>>>>>>>>>>> server
>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>>>>> SASL
>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>> null,
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    --
>>>>>>>>>> Thanks,
>>>>>>>>>> Ewen
>>>>>>>>>>
>>>>>>>>>>
>>>>> --
>>>>> -- Guozhang
>>>


Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Neha Narkhede <ne...@confluent.io>.
Since we have heard back from Stefan on this issue, we should just ask him
to release another version of ZkClient with the fixes we want.

It may be better to take the discussion of whether or not we should replace
ZkClient with Curator to its own KIP thread. In general, there are several
issues we have seen with the lack of flexibility in the ZkClient APIs.
Moving to Curator will not solve those problems and will add another layer
that we don't really understand. For something like ZooKeeper, it is worth
being as close to the raw ZooKeeper client as possible. In any case, we can
discuss the details when there is a KIP for this.

On Mon, Feb 9, 2015 at 7:34 PM, Jaikiran Pai <ja...@gmail.com>
wrote:

> Hi,
>
> I don't have enough context to know if replacing ZkClient is important
> right now. However, I did take a look at the code to see how extensively
> ZkClient gets used and I agree with Gwen that replacing it is a bigger task
> and will need further testing too, to ensure they don't have issues of
> their own, which affect us.
>
> IMO, using the newer version of ZkClient which the ZkClient team are
> willing to release might be a good idea for resolving the immediate issues
> at hand. So I think running certain tests, using the current dev/snapshot
> version of ZkClient, to verify that the JIRAs that we expect to be resolved
> are indeed resolved and then asking the ZkClient team to do a release might
> be something that we should do. If this sounds good and if someone can
> point me to the exact JIRAs that need to be verified, then I can look into
> this. Let me know.
>
> -Jaikiran
>
> On Thursday 05 February 2015 02:51 AM, Gwen Shapira wrote:
>
>> Hi,
>>
>> KAFKA-1155 is likely Zookeeper and not the specific client.
>> I believe the rest are already fixed in ZKClient and its a matter of
>> asking
>> them to release, rebase our code and make sure the issues are resolved (or
>> that we use the features ZKClient added to resolve them).
>>
>> I'm a fan of Curator, but its not exactly a drop-in replacement for
>> ZKClient (the APIs are slightly different, if we even decide to just use
>> the APIs and not the recipes). I suspect that replacing ZKClient with
>> Curator is a large project. Perhaps too large to resolve 3 issues that are
>> already resolved in ZKClient.
>>
>> What are the benefits you guys see in the replacement?
>>
>> Gwen
>>
>>
>> On Tue, Feb 3, 2015 at 10:42 PM, Guozhang Wang <wa...@gmail.com>
>> wrote:
>>
>>  Now may be a good time.
>>>
>>> We could verify if Curator has fixed the known issues we have seen so
>>> far,
>>> an incomplete list would be:
>>>
>>> KAFKA-1082 <https://issues.apache.org/jira/browse/KAFKA-1082>
>>> KAFKA-1155 <https://issues.apache.org/jira/browse/KAFKA-1155>
>>> KAFKA-1907 <https://issues.apache.org/jira/browse/KAFKA-1907>
>>> KAFKA-992 <https://issues.apache.org/jira/browse/KAFKA-992>
>>>
>>>
>>>
>>> Guozhang
>>>
>>> On Tue, Feb 3, 2015 at 10:21 PM, Ashish Singh <as...@cloudera.com>
>>> wrote:
>>>
>>>  +1 on using curator.
>>>>
>>>> On Tue, Feb 3, 2015 at 10:09 PM, Manikumar Reddy <ku...@nmsworks.co.in>
>>>> wrote:
>>>>
>>>>  I think we should consider to moving to  apache curator (KAFKA-873).
>>>>> Curator is now more mature and a apache top-level project.
>>>>>
>>>>>
>>>>> On Wed, Feb 4, 2015 at 11:29 AM, Harsha <ka...@harsha.io> wrote:
>>>>>
>>>>>  Any reason not to go with apache curator http://curator.apache.org/
>>>>>>
>>>>> .
>>>
>>>> -Harsha
>>>>>> On Tue, Feb 3, 2015, at 09:55 PM, Guozhang Wang wrote:
>>>>>>
>>>>>>> I am also +1 on Neha's suggestion that "At some point, if we find
>>>>>>> ourselves
>>>>>>> fiddling too much with ZkClient, it wouldn't hurt to write our own
>>>>>>>
>>>>>> little
>>>>>
>>>>>> zookeeper client wrapper." since we have accumulated a bunch of
>>>>>>>
>>>>>> issues
>>>>
>>>>> with
>>>>>>> zkClient which takes long time be resolved if ever, so we ended up
>>>>>>>
>>>>>> have
>>>>
>>>>> some hacky way handling zkClient errors.
>>>>>>>
>>>>>>> Guozhang
>>>>>>>
>>>>>>> On Tue, Feb 3, 2015 at 7:47 PM, Jaikiran Pai <
>>>>>>>
>>>>>> jai.forums2013@gmail.com
>>>>
>>>>> wrote:
>>>>>>>
>>>>>>>  Yes, that's the plan :)
>>>>>>>>
>>>>>>>> -Jaikiran
>>>>>>>>
>>>>>>>> On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:
>>>>>>>>
>>>>>>>>  So I think the current plan is:
>>>>>>>>> 1. Add timeout in zkclient
>>>>>>>>> 2. Ask zkclient to release new version (we need it for few other
>>>>>>>>>
>>>>>>>> things
>>>>>>
>>>>>>> too)
>>>>>>>>> 3. Rebase on new zkclient
>>>>>>>>> 4. Fix this jira and the few others than were waiting for the
>>>>>>>>>
>>>>>>>> new
>>>
>>>> zkclient
>>>>>>
>>>>>>> Does that make sense?
>>>>>>>>>
>>>>>>>>> Gwen
>>>>>>>>>
>>>>>>>>> On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <
>>>>>>>>>
>>>>>>>> jai.forums2013@gmail.com>
>>>>>>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>  I just heard back from Stefan, who manages the ZkClient repo
>>>>>>>>>>
>>>>>>>>> and
>>>
>>>> he
>>>>
>>>>> seems to
>>>>>>>>>> be open to have these changes be part of ZkClient project. I'll
>>>>>>>>>>
>>>>>>>>> be
>>>>
>>>>> creating
>>>>>>>>>> a pull request for that project to have it reviewed and merged.
>>>>>>>>>>
>>>>>>>>> Although
>>>>>>
>>>>>>> I
>>>>>>>>>> haven't heard of exact release plans, Stefan's reply did
>>>>>>>>>>
>>>>>>>>> indicate
>>>
>>>> that
>>>>>>
>>>>>>> the
>>>>>>>>>> project could be released after this change is merged.
>>>>>>>>>>
>>>>>>>>>> -Jaikiran
>>>>>>>>>>
>>>>>>>>>> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
>>>>>>>>>>
>>>>>>>>>>  Thanks for pointing to that repo!
>>>>>>>>>>>
>>>>>>>>>>> I just had a look at it and it appears that the project isn't
>>>>>>>>>>>
>>>>>>>>>> much
>>>>
>>>>> active
>>>>>>>>>>> (going by the lack of activity). The latest contribution is
>>>>>>>>>>>
>>>>>>>>>> from
>>>
>>>> Gwen
>>>>>>
>>>>>>> and
>>>>>>>>>>> that was around 3 months back. I haven't found release plans
>>>>>>>>>>>
>>>>>>>>>> for
>>>
>>>> that
>>>>>>
>>>>>>> project or a place to ask about it (filing an issue doesn't
>>>>>>>>>>>
>>>>>>>>>> seem
>>>
>>>> right
>>>>>>
>>>>>>> to
>>>>>>>>>>> ask this question). So I'll get in touch with the repo owner
>>>>>>>>>>>
>>>>>>>>>> and
>>>
>>>> see
>>>>>
>>>>>> what
>>>>>>>>>>> his plans for the project are.
>>>>>>>>>>>
>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>
>>>>>>>>>>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
>>>>>>>>>>>
>>>>>>>>>>>  I did!
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for clarifying :)
>>>>>>>>>>>>
>>>>>>>>>>>> The client that is part of Zookeeper itself actually does
>>>>>>>>>>>>
>>>>>>>>>>> support
>>>>
>>>>> timeouts.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <
>>>>>>>>>>>>
>>>>>>>>>>> wangguoz@gmail.com>
>>>>>
>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>  Hi Jaikiran,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think Gwen was talking about contributing to ZkClient
>>>>>>>>>>>>>
>>>>>>>>>>>> project:
>>>>
>>>>> https://github.com/sgroschupf/zkclient
>>>>>>>>>>>>>
>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <
>>>>>>>>>>>>> jai.forums2013@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   Hi Gwen,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and
>>>>>>>>>>>>>>
>>>>>>>>>>>>> not a
>>>
>>>> complete
>>>>>>>>>>>>>> replacement.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for contributing to Zookeeper, yes that indeed in on my
>>>>>>>>>>>>>>
>>>>>>>>>>>>> mind,
>>>>>
>>>>>> but
>>>>>>
>>>>>>> I
>>>>>>>>>>>>>> haven't yet had a chance to really look deeper into
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Zookeeper
>>>
>>>> or
>>>>>
>>>>>> get
>>>>>>
>>>>>>> in
>>>>>>>>>>>>>> touch with their dev team to try and explain this potential
>>>>>>>>>>>>>> improvement
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> them. I have no objection to contributing this or something
>>>>>>>>>>>>>>
>>>>>>>>>>>>> similar
>>>>>>
>>>>>>> to
>>>>>>>>>>>>>> Zookeeper directly. I think I should be able to bring this
>>>>>>>>>>>>>>
>>>>>>>>>>>>> up
>>>
>>>> in
>>>>>
>>>>>> the
>>>>>>
>>>>>>> Zookeeper dev forum, sometime soon in the next few
>>>>>>>>>>>>>>
>>>>>>>>>>>>> weekends.
>>>
>>>> -Jaikiran
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   It looks like the new KafkaZkClient is a wrapper around
>>>>>>>>>>>>>>
>>>>>>>>>>>>> ZkClient,
>>>>>>
>>>>>>> but
>>>>>>>>>>>>>>> not a replacement. Did I get it right?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think a wrapper for ZkClient can be useful - for example
>>>>>>>>>>>>>>> KAFKA-1664
>>>>>>>>>>>>>>> can also use one.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However, I'm wondering why not contribute the fix directly
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> to
>>>>
>>>>> ZKClient
>>>>>>>>>>>>>>> project and ask for a release that contains the fix?
>>>>>>>>>>>>>>> This will benefit other users of the project who may also
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> need a
>>>>>
>>>>>> timeout (thats pretty basic...)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As an alternative, if we don't want to collaborate with
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ZKClient for
>>>>>>
>>>>>>> some reason, forking the project into Kafka will probably
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> give
>>>>
>>>>> us
>>>>>>
>>>>>>> more
>>>>>>>>>>>>>>> control than wrappers and without much downside.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Just a thought.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Gwen
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
>>>>>>>>>>>>>>> <ja...@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   Neha, Ewen (and others), my initial attempt to solve this
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is
>>>>
>>>>> uploaded
>>>>>>>>>>>>>>>> here
>>>>>>>>>>>>>>>> https://reviews.apache.org/r/30477/. It solves the
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> shutdown
>>>>
>>>>> problem
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>> the server shuts down even when Zookeeper has gone down
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> before
>>>>>
>>>>>> the
>>>>>>
>>>>>>> Kafka
>>>>>>>>>>>>>>>> server.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I went with the approach of introducing a custom
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (enhanced)
>>>
>>>> ZkClient
>>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>>> for now allows time outs to be optionally specified for
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> certain
>>>>>
>>>>>> operations.
>>>>>>>>>>>>>>>> I intentionally haven't forced the use of this new
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> KafkaZkClient
>>>>>>
>>>>>>> all
>>>>>>>>>>>>>>>> over
>>>>>>>>>>>>>>>> the code and instead for now have just used it in the
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> KafkaServer.
>>>>>>
>>>>>>> Does this patch look like something worth using?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>
>>>>   Ewen is right. ZkClient APIs are blocking and the right
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> fix
>>>>
>>>>> for
>>>>>>
>>>>>>> this
>>>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>>> to be patching ZkClient. At some point, if we find
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ourselves
>>>>
>>>>> fiddling
>>>>>>>>>>>>>>>>> too
>>>>>>>>>>>>>>>>> much with ZkClient, it wouldn't hurt to write our own
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> little
>>>>
>>>>> zookeeper
>>>>>>>>>>>>>>>>> client wrapper.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>>>>>>>>>>>>>>> <ew...@confluent.io>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     Looks like a bug to me -- the underlying ZK library
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> wraps a
>>>>>
>>>>>> lot of
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  blocking
>>>>>>>>>>>>>>>>>> method implementations with waitUntilConnected() calls
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> without
>>>>>>
>>>>>>> any
>>>>>>>>>>>>>>>>>> timeouts. Ideally we could just add a version of
>>>>>>>>>>>>>>>>>> ZkUtils.getController()
>>>>>>>>>>>>>>>>>> with a timeout, but I don't see an easy way to
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> accomplish
>>>
>>>> that
>>>>>>
>>>>>>> with
>>>>>>>>>>>>>>>>>> ZkClient.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> There's at least one other call to ZkUtils besides the
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> one
>>>>
>>>>> in the
>>>>>>
>>>>>>> stacktrace you gave that would cause the same issue,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> possibly
>>>>>
>>>>>> more
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> aren't directly called in that method. One ugly
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> solution
>>>
>>>> would be
>>>>>>
>>>>>>> to
>>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>> extra thread during shutdown to trigger timeouts, but
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd
>>>
>>>> imagine
>>>>>>
>>>>>>> we
>>>>>>>>>>>>>>>>>> probably have other threads that could end up blocking
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> in
>>>
>>>> similar
>>>>>>
>>>>>>> ways.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I filed
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1907
>>>
>>>> to
>>>>>
>>>>>> track
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> issue.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>>>>>>>>>>>>>>> jai.forums2013@gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     The main culprit is this thread which goes into
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "forever
>>>>
>>>>> retry
>>>>>>
>>>>>>> connection
>>>>>>>>>>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ctrl +
>>>>
>>>>> C)
>>>>>>
>>>>>>> after
>>>>>>>>>>>>>>>>>>> zookeeper has already been shutdown. I have attached
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> the
>>>
>>>> complete
>>>>>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>>>>> dump, but I don't know if it will be delivered to the
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> mailing
>>>>>>
>>>>>>> list.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> on
>>>
>>>> condition
>>>>>>>>>>>>>>>>>>> [0x6ad69000]
>>>>>>>>>>>>>>>>>>>         java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>>>>>>>>>>>>>>          at sun.misc.Unsafe.park(Native Method)
>>>>>>>>>>>>>>>>>>>          - parking to wait for  <0x70a93368> (a
>>>>>>>>>>>>>>>>>>> java.util.concurrent.locks.
>>>>>>>>>>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>
>>>>>> LockSupport.java:267)
>>>>>>>>>>>>>>>>>>>          at java.util.concurrent.locks.
>>>>>>>>>>>>>>>>>>> AbstractQueuedSynchronizer$
>>>>>>>>>>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.
>>>>>>>>>>>>>>>>>>> java:2130)
>>>>>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  org.I0Itec.zkclient.ZkClient.
>>>> waitForKeeperState(ZkClient.
>>>>
>>>>> java:636)
>>>>>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  org.I0Itec.zkclient.ZkClient.
>>>> waitUntilConnected(ZkClient.
>>>>
>>>>> java:619)
>>>>>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  org.I0Itec.zkclient.ZkClient.
>>>> waitUntilConnected(ZkClient.
>>>>
>>>>> java:615)
>>>>>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  org.I0Itec.zkclient.ZkClient.
>>>>> retryUntilConnected(ZkClient.
>>>>>
>>>>>> java:679)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>           at org.I0Itec.zkclient.ZkClient.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> readData(ZkClient.java:766)
>>>>>>>>>>>>>>>>>>>          at org.I0Itec.zkclient.ZkClient.
>>>>>>>>>>>>>>>>>>> readData(ZkClient.java:761)
>>>>>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.
>>>> scala:456)
>>>>
>>>>>          at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>
>>>>>>>          at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>>
>>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>>>>>>>>>>>>>>          at kafka.server.KafkaServer$$
>>>>>>>>>>>>>>>>>>> anonfun$shutdown$1.apply$mcV$
>>>>>>>>>>>>>>>>>>> sp(KafkaServer.scala:269)
>>>>>>>>>>>>>>>>>>>          at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>>>>>>>>>>>>>>          at kafka.utils.Logging$class.
>>>>>>>>>>>>>>>>>>> swallowWarn(Logging.scala:92)
>>>>>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>
>>>>>          at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>
>>>>>>>          at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> kafka.server.KafkaServer.shutdown(KafkaServer.scala:
>>>>>>
>>>>>>> 269)
>>>>>>>>>>>>>>>>>>>          at kafka.server.KafkaServerStartable.shutdown(
>>>>>>>>>>>>>>>>>>> KafkaServerStartable.scala:42)
>>>>>>>>>>>>>>>>>>>          at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> wrote:
>>>
>>>>     For a clean shutdown, the broker tries to talk to
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> the
>>>
>>>> controller
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>> issues reads to zookeeper. Possibly that is where it
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> tries
>>>>
>>>>> to
>>>>>>
>>>>>>> reconnect
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   to
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> zk. It will help to look at the thread dump.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  Thanks
>>>>>>>>>>>>>>>>>>>> Neha
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>


-- 
Thanks,
Neha

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Jaikiran Pai <ja...@gmail.com>.
Hi,

I don't have enough context to know if replacing ZkClient is important 
right now. However, I did take a look at the code to see how extensively 
ZkClient gets used and I agree with Gwen that replacing it is a bigger 
task and will need further testing too, to ensure they don't have issues 
of their own, which affect us.

IMO, using the newer version of ZkClient which the ZkClient team are 
willing to release might be a good idea for resolving the immediate 
issues at hand. So I think running certain tests, using the current 
dev/snapshot version of ZkClient, to verify that the JIRAs that we 
expect to be resolved are indeed resolved and then asking the ZkClient 
team to do a release might be something that we should do. If this 
sounds good and if someone can point me to the exact JIRAs that need to 
be verified, then I can look into this. Let me know.

-Jaikiran

On Thursday 05 February 2015 02:51 AM, Gwen Shapira wrote:
> Hi,
>
> KAFKA-1155 is likely Zookeeper and not the specific client.
> I believe the rest are already fixed in ZKClient and its a matter of asking
> them to release, rebase our code and make sure the issues are resolved (or
> that we use the features ZKClient added to resolve them).
>
> I'm a fan of Curator, but its not exactly a drop-in replacement for
> ZKClient (the APIs are slightly different, if we even decide to just use
> the APIs and not the recipes). I suspect that replacing ZKClient with
> Curator is a large project. Perhaps too large to resolve 3 issues that are
> already resolved in ZKClient.
>
> What are the benefits you guys see in the replacement?
>
> Gwen
>
>
> On Tue, Feb 3, 2015 at 10:42 PM, Guozhang Wang <wa...@gmail.com> wrote:
>
>> Now may be a good time.
>>
>> We could verify if Curator has fixed the known issues we have seen so far,
>> an incomplete list would be:
>>
>> KAFKA-1082 <https://issues.apache.org/jira/browse/KAFKA-1082>
>> KAFKA-1155 <https://issues.apache.org/jira/browse/KAFKA-1155>
>> KAFKA-1907 <https://issues.apache.org/jira/browse/KAFKA-1907>
>> KAFKA-992 <https://issues.apache.org/jira/browse/KAFKA-992>
>>
>>
>>
>> Guozhang
>>
>> On Tue, Feb 3, 2015 at 10:21 PM, Ashish Singh <as...@cloudera.com> wrote:
>>
>>> +1 on using curator.
>>>
>>> On Tue, Feb 3, 2015 at 10:09 PM, Manikumar Reddy <ku...@nmsworks.co.in>
>>> wrote:
>>>
>>>> I think we should consider to moving to  apache curator (KAFKA-873).
>>>> Curator is now more mature and a apache top-level project.
>>>>
>>>>
>>>> On Wed, Feb 4, 2015 at 11:29 AM, Harsha <ka...@harsha.io> wrote:
>>>>
>>>>> Any reason not to go with apache curator http://curator.apache.org/
>> .
>>>>> -Harsha
>>>>> On Tue, Feb 3, 2015, at 09:55 PM, Guozhang Wang wrote:
>>>>>> I am also +1 on Neha's suggestion that "At some point, if we find
>>>>>> ourselves
>>>>>> fiddling too much with ZkClient, it wouldn't hurt to write our own
>>>> little
>>>>>> zookeeper client wrapper." since we have accumulated a bunch of
>>> issues
>>>>>> with
>>>>>> zkClient which takes long time be resolved if ever, so we ended up
>>> have
>>>>>> some hacky way handling zkClient errors.
>>>>>>
>>>>>> Guozhang
>>>>>>
>>>>>> On Tue, Feb 3, 2015 at 7:47 PM, Jaikiran Pai <
>>> jai.forums2013@gmail.com
>>>>>> wrote:
>>>>>>
>>>>>>> Yes, that's the plan :)
>>>>>>>
>>>>>>> -Jaikiran
>>>>>>>
>>>>>>> On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:
>>>>>>>
>>>>>>>> So I think the current plan is:
>>>>>>>> 1. Add timeout in zkclient
>>>>>>>> 2. Ask zkclient to release new version (we need it for few other
>>>>> things
>>>>>>>> too)
>>>>>>>> 3. Rebase on new zkclient
>>>>>>>> 4. Fix this jira and the few others than were waiting for the
>> new
>>>>> zkclient
>>>>>>>> Does that make sense?
>>>>>>>>
>>>>>>>> Gwen
>>>>>>>>
>>>>>>>> On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <
>>>>> jai.forums2013@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I just heard back from Stefan, who manages the ZkClient repo
>> and
>>> he
>>>>>>>>> seems to
>>>>>>>>> be open to have these changes be part of ZkClient project. I'll
>>> be
>>>>>>>>> creating
>>>>>>>>> a pull request for that project to have it reviewed and merged.
>>>>> Although
>>>>>>>>> I
>>>>>>>>> haven't heard of exact release plans, Stefan's reply did
>> indicate
>>>>> that
>>>>>>>>> the
>>>>>>>>> project could be released after this change is merged.
>>>>>>>>>
>>>>>>>>> -Jaikiran
>>>>>>>>>
>>>>>>>>> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for pointing to that repo!
>>>>>>>>>>
>>>>>>>>>> I just had a look at it and it appears that the project isn't
>>> much
>>>>>>>>>> active
>>>>>>>>>> (going by the lack of activity). The latest contribution is
>> from
>>>>> Gwen
>>>>>>>>>> and
>>>>>>>>>> that was around 3 months back. I haven't found release plans
>> for
>>>>> that
>>>>>>>>>> project or a place to ask about it (filing an issue doesn't
>> seem
>>>>> right
>>>>>>>>>> to
>>>>>>>>>> ask this question). So I'll get in touch with the repo owner
>> and
>>>> see
>>>>>>>>>> what
>>>>>>>>>> his plans for the project are.
>>>>>>>>>>
>>>>>>>>>> -Jaikiran
>>>>>>>>>>
>>>>>>>>>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
>>>>>>>>>>
>>>>>>>>>>> I did!
>>>>>>>>>>>
>>>>>>>>>>> Thanks for clarifying :)
>>>>>>>>>>>
>>>>>>>>>>> The client that is part of Zookeeper itself actually does
>>> support
>>>>>>>>>>> timeouts.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <
>>>> wangguoz@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Jaikiran,
>>>>>>>>>>>>
>>>>>>>>>>>> I think Gwen was talking about contributing to ZkClient
>>> project:
>>>>>>>>>>>> https://github.com/sgroschupf/zkclient
>>>>>>>>>>>>
>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <
>>>>>>>>>>>> jai.forums2013@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>   Hi Gwen,
>>>>>>>>>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and
>> not a
>>>>>>>>>>>>> complete
>>>>>>>>>>>>> replacement.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As for contributing to Zookeeper, yes that indeed in on my
>>>> mind,
>>>>> but
>>>>>>>>>>>>> I
>>>>>>>>>>>>> haven't yet had a chance to really look deeper into
>> Zookeeper
>>>> or
>>>>> get
>>>>>>>>>>>>> in
>>>>>>>>>>>>> touch with their dev team to try and explain this potential
>>>>>>>>>>>>> improvement
>>>>>>>>>>>>> to
>>>>>>>>>>>>> them. I have no objection to contributing this or something
>>>>> similar
>>>>>>>>>>>>> to
>>>>>>>>>>>>> Zookeeper directly. I think I should be able to bring this
>> up
>>>> in
>>>>> the
>>>>>>>>>>>>> Zookeeper dev forum, sometime soon in the next few
>> weekends.
>>>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   It looks like the new KafkaZkClient is a wrapper around
>>>>> ZkClient,
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>> not a replacement. Did I get it right?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think a wrapper for ZkClient can be useful - for example
>>>>>>>>>>>>>> KAFKA-1664
>>>>>>>>>>>>>> can also use one.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> However, I'm wondering why not contribute the fix directly
>>> to
>>>>>>>>>>>>>> ZKClient
>>>>>>>>>>>>>> project and ask for a release that contains the fix?
>>>>>>>>>>>>>> This will benefit other users of the project who may also
>>>> need a
>>>>>>>>>>>>>> timeout (thats pretty basic...)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As an alternative, if we don't want to collaborate with
>>>>> ZKClient for
>>>>>>>>>>>>>> some reason, forking the project into Kafka will probably
>>> give
>>>>> us
>>>>>>>>>>>>>> more
>>>>>>>>>>>>>> control than wrappers and without much downside.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Just a thought.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Gwen
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
>>>>>>>>>>>>>> <ja...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   Neha, Ewen (and others), my initial attempt to solve this
>>> is
>>>>>>>>>>>>>>> uploaded
>>>>>>>>>>>>>>> here
>>>>>>>>>>>>>>> https://reviews.apache.org/r/30477/. It solves the
>>> shutdown
>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>> the server shuts down even when Zookeeper has gone down
>>>> before
>>>>> the
>>>>>>>>>>>>>>> Kafka
>>>>>>>>>>>>>>> server.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I went with the approach of introducing a custom
>> (enhanced)
>>>>>>>>>>>>>>> ZkClient
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>> for now allows time outs to be optionally specified for
>>>> certain
>>>>>>>>>>>>>>> operations.
>>>>>>>>>>>>>>> I intentionally haven't forced the use of this new
>>>>> KafkaZkClient
>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>> over
>>>>>>>>>>>>>>> the code and instead for now have just used it in the
>>>>> KafkaServer.
>>>>>>>>>>>>>>> Does this patch look like something worth using?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede
>> wrote:
>>>>>>>>>>>>>>>   Ewen is right. ZkClient APIs are blocking and the right
>>> fix
>>>>> for
>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>> to be patching ZkClient. At some point, if we find
>>> ourselves
>>>>>>>>>>>>>>>> fiddling
>>>>>>>>>>>>>>>> too
>>>>>>>>>>>>>>>> much with ZkClient, it wouldn't hurt to write our own
>>> little
>>>>>>>>>>>>>>>> zookeeper
>>>>>>>>>>>>>>>> client wrapper.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>>>>>>>>>>>>>> <ew...@confluent.io>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     Looks like a bug to me -- the underlying ZK library
>>>> wraps a
>>>>>>>>>>>>>>>> lot of
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> blocking
>>>>>>>>>>>>>>>>> method implementations with waitUntilConnected() calls
>>>>> without
>>>>>>>>>>>>>>>>> any
>>>>>>>>>>>>>>>>> timeouts. Ideally we could just add a version of
>>>>>>>>>>>>>>>>> ZkUtils.getController()
>>>>>>>>>>>>>>>>> with a timeout, but I don't see an easy way to
>> accomplish
>>>>> that
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> ZkClient.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There's at least one other call to ZkUtils besides the
>>> one
>>>>> in the
>>>>>>>>>>>>>>>>> stacktrace you gave that would cause the same issue,
>>>> possibly
>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> aren't directly called in that method. One ugly
>> solution
>>>>> would be
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>> extra thread during shutdown to trigger timeouts, but
>> I'd
>>>>> imagine
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> probably have other threads that could end up blocking
>> in
>>>>> similar
>>>>>>>>>>>>>>>>> ways.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I filed
>> https://issues.apache.org/jira/browse/KAFKA-1907
>>>> to
>>>>>>>>>>>>>>>>> track
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> issue.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>>>>>>>>>>>>>> jai.forums2013@gmail.com>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     The main culprit is this thread which goes into
>>> "forever
>>>>> retry
>>>>>>>>>>>>>>>>>> connection
>>>>>>>>>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a
>>> Ctrl +
>>>>> C)
>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>> zookeeper has already been shutdown. I have attached
>> the
>>>>>>>>>>>>>>>>>> complete
>>>>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>>>> dump, but I don't know if it will be delivered to the
>>>>> mailing
>>>>>>>>>>>>>>>>>> list.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting
>> on
>>>>>>>>>>>>>>>>>> condition
>>>>>>>>>>>>>>>>>> [0x6ad69000]
>>>>>>>>>>>>>>>>>>         java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>>>>>>>>>>>>>          at sun.misc.Unsafe.park(Native Method)
>>>>>>>>>>>>>>>>>>          - parking to wait for  <0x70a93368> (a
>>>>>>>>>>>>>>>>>> java.util.concurrent.locks.
>>>>>>>>>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>>>>>>>>>>>>>          at
>>>> java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>>>>>>>>>>>>>> LockSupport.java:267)
>>>>>>>>>>>>>>>>>>          at java.util.concurrent.locks.
>>>>>>>>>>>>>>>>>> AbstractQueuedSynchronizer$
>>>>>>>>>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.
>>>>>>>>>>>>>>>>>> java:2130)
>>>>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>>>>
>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.
>>>>>>>>>>>>>>>>>> java:636)
>>>>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>>>>
>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
>>>>>>>>>>>>>>>>>> java:619)
>>>>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>>>>
>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
>>>>>>>>>>>>>>>>>> java:615)
>>>>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
>>>>>>>>>>>>>>>>> java:679)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>           at org.I0Itec.zkclient.ZkClient.
>>>>>>>>>>>>>>>>>> readData(ZkClient.java:766)
>>>>>>>>>>>>>>>>>>          at org.I0Itec.zkclient.ZkClient.
>>>>>>>>>>>>>>>>>> readData(ZkClient.java:761)
>>>>>>>>>>>>>>>>>>          at
>>>>>>>>>>>>>>>>>>
>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>>>>>>>>>>>>>>          at
>>>>> kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>>>>>>>>>>>>>          at
>>>>> kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>>>>>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>>>>>>>>>>>>>          at kafka.server.KafkaServer$$
>>>>>>>>>>>>>>>>>> anonfun$shutdown$1.apply$mcV$
>>>>>>>>>>>>>>>>>> sp(KafkaServer.scala:269)
>>>>>>>>>>>>>>>>>>          at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>>>>>>>>>>>>>          at kafka.utils.Logging$class.
>>>>>>>>>>>>>>>>>> swallowWarn(Logging.scala:92)
>>>>>>>>>>>>>>>>>>          at
>>> kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>>>>>>>>>>>>>>          at
>>>>> kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>>>>>>>>>>>>>          at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>>>>>>>>>>>>>          at
>>>>> kafka.server.KafkaServer.shutdown(KafkaServer.scala:
>>>>>>>>>>>>>>>>>> 269)
>>>>>>>>>>>>>>>>>>          at kafka.server.KafkaServerStartable.shutdown(
>>>>>>>>>>>>>>>>>> KafkaServerStartable.scala:42)
>>>>>>>>>>>>>>>>>>          at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede
>> wrote:
>>>>>>>>>>>>>>>>>>     For a clean shutdown, the broker tries to talk to
>> the
>>>>>>>>>>>>>>>>>> controller
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>> issues reads to zookeeper. Possibly that is where it
>>> tries
>>>>> to
>>>>>>>>>>>>>>>>>>> reconnect
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   to
>>>>>>>>>>>>>>>>>> zk. It will help to look at the thread dump.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>> Neha
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
>>>>>>>>>>>>>>>>>>> jai.forums2013@gmail.com
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>       I was just playing around with the RC2 of 0.8.2
>>> and
>>>>>>>>>>>>>>>>>>> noticed
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> if I
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   shutdown zookeeper first I can't shutdown Kafka
>> server
>>>> at
>>>>> all
>>>>>>>>>>>>>>>>>>>> since
>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> goes
>>>>>>>>>>>>>>>>>>>> into a never ending attempt to reconnect with
>>> zookeeper.
>>>>> I had
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> kill
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk
>> too
>>>> and
>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>> too I
>>>>>>>>>>>>>>>>>>>> see
>>>>>>>>>>>>>>>>>>>> the same issue. Should I file a JIRA for this and
>> see
>>> if
>>>>> I can
>>>>>>>>>>>>>>>>>>>> come
>>>>>>>>>>>>>>>>>>>> up
>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>> a patch?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent)
>>>> attempts
>>>>> at
>>>>>>>>>>>>>>>>>>>> trying
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> reconnect. I've a thread dump too which shows that
>> the
>>>>> other
>>>>>>>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   which
>>>>>>>>>>>>>>>>>> is trying to complete a controlled shutdown of Kafka
>> is
>>>>> blocked
>>>>>>>>>>>>>>>>>>> forever
>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session
>>> 0x14b1a4136800000
>>>>> for
>>>>>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   null,
>>>>>>>>>>>>>>>>>> unexpected error, closing socket connection and
>>> attempting
>>>>>>>>>>>>>>>>>> reconnect
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>>>>>>>>           at
>>>>> sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>>>>>>>>           at
>>> sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>>>>>>>>           at
>> org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>>>>>>>>           at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket
>>> connection
>>>>> to
>>>>>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
>>>>> authenticate
>>>>>>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>>>>>>> SASL
>>>>>>>>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session
>>> 0x14b1a4136800000
>>>>> for
>>>>>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   null,
>>>>>>>>>>>>>>>>>> unexpected error, closing socket connection and
>>> attempting
>>>>>>>>>>>>>>>>>> reconnect
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>>>>>>>>           at
>>>>> sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>>>>>>>>           at
>>> sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>>>>>>>>           at
>> org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>>>>>>>>           at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket
>>> connection
>>>>> to
>>>>>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
>>>>> authenticate
>>>>>>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>>>>>>> SASL
>>>>>>>>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session
>>> 0x14b1a4136800000
>>>>> for
>>>>>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   null,
>>>>>>>>>>>>>>>>>> unexpected error, closing socket connection and
>>> attempting
>>>>>>>>>>>>>>>>>> reconnect
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>>>>>>>>           at
>>>>> sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>>>>>>>>           at
>>> sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>>>>>>>>           at
>> org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>>>>>>>>           at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket
>>> connection
>>>>> to
>>>>>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
>>>>> authenticate
>>>>>>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>>>>>>> SASL
>>>>>>>>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session
>>> 0x14b1a4136800000
>>>>> for
>>>>>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   null,
>>>>>>>>>>>>>>>>>> unexpected error, closing socket connection and
>>> attempting
>>>>>>>>>>>>>>>>>> reconnect
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>>>>>>>>           at
>>>>> sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>>>>>>>>           at
>>> sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>>>>>>>>           at
>> org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>>>>>>>>           at
>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     --
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Ewen
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   --
>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> -- Guozhang
>>>
>>>
>>> --
>>>
>>> Regards,
>>> Ashish
>>>
>>
>>
>> --
>> -- Guozhang
>>


Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Gwen Shapira <gs...@cloudera.com>.
Hi,

KAFKA-1155 is likely Zookeeper and not the specific client.
I believe the rest are already fixed in ZKClient and its a matter of asking
them to release, rebase our code and make sure the issues are resolved (or
that we use the features ZKClient added to resolve them).

I'm a fan of Curator, but its not exactly a drop-in replacement for
ZKClient (the APIs are slightly different, if we even decide to just use
the APIs and not the recipes). I suspect that replacing ZKClient with
Curator is a large project. Perhaps too large to resolve 3 issues that are
already resolved in ZKClient.

What are the benefits you guys see in the replacement?

Gwen


On Tue, Feb 3, 2015 at 10:42 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Now may be a good time.
>
> We could verify if Curator has fixed the known issues we have seen so far,
> an incomplete list would be:
>
> KAFKA-1082 <https://issues.apache.org/jira/browse/KAFKA-1082>
> KAFKA-1155 <https://issues.apache.org/jira/browse/KAFKA-1155>
> KAFKA-1907 <https://issues.apache.org/jira/browse/KAFKA-1907>
> KAFKA-992 <https://issues.apache.org/jira/browse/KAFKA-992>
>
>
>
> Guozhang
>
> On Tue, Feb 3, 2015 at 10:21 PM, Ashish Singh <as...@cloudera.com> wrote:
>
> > +1 on using curator.
> >
> > On Tue, Feb 3, 2015 at 10:09 PM, Manikumar Reddy <ku...@nmsworks.co.in>
> > wrote:
> >
> > > I think we should consider to moving to  apache curator (KAFKA-873).
> > > Curator is now more mature and a apache top-level project.
> > >
> > >
> > > On Wed, Feb 4, 2015 at 11:29 AM, Harsha <ka...@harsha.io> wrote:
> > >
> > > > Any reason not to go with apache curator http://curator.apache.org/
> .
> > > > -Harsha
> > > > On Tue, Feb 3, 2015, at 09:55 PM, Guozhang Wang wrote:
> > > > > I am also +1 on Neha's suggestion that "At some point, if we find
> > > > > ourselves
> > > > > fiddling too much with ZkClient, it wouldn't hurt to write our own
> > > little
> > > > > zookeeper client wrapper." since we have accumulated a bunch of
> > issues
> > > > > with
> > > > > zkClient which takes long time be resolved if ever, so we ended up
> > have
> > > > > some hacky way handling zkClient errors.
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Tue, Feb 3, 2015 at 7:47 PM, Jaikiran Pai <
> > jai.forums2013@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Yes, that's the plan :)
> > > > > >
> > > > > > -Jaikiran
> > > > > >
> > > > > > On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:
> > > > > >
> > > > > >> So I think the current plan is:
> > > > > >> 1. Add timeout in zkclient
> > > > > >> 2. Ask zkclient to release new version (we need it for few other
> > > > things
> > > > > >> too)
> > > > > >> 3. Rebase on new zkclient
> > > > > >> 4. Fix this jira and the few others than were waiting for the
> new
> > > > zkclient
> > > > > >>
> > > > > >> Does that make sense?
> > > > > >>
> > > > > >> Gwen
> > > > > >>
> > > > > >> On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <
> > > > jai.forums2013@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> I just heard back from Stefan, who manages the ZkClient repo
> and
> > he
> > > > > >>> seems to
> > > > > >>> be open to have these changes be part of ZkClient project. I'll
> > be
> > > > > >>> creating
> > > > > >>> a pull request for that project to have it reviewed and merged.
> > > > Although
> > > > > >>> I
> > > > > >>> haven't heard of exact release plans, Stefan's reply did
> indicate
> > > > that
> > > > > >>> the
> > > > > >>> project could be released after this change is merged.
> > > > > >>>
> > > > > >>> -Jaikiran
> > > > > >>>
> > > > > >>> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
> > > > > >>>
> > > > > >>>> Thanks for pointing to that repo!
> > > > > >>>>
> > > > > >>>> I just had a look at it and it appears that the project isn't
> > much
> > > > > >>>> active
> > > > > >>>> (going by the lack of activity). The latest contribution is
> from
> > > > Gwen
> > > > > >>>> and
> > > > > >>>> that was around 3 months back. I haven't found release plans
> for
> > > > that
> > > > > >>>> project or a place to ask about it (filing an issue doesn't
> seem
> > > > right
> > > > > >>>> to
> > > > > >>>> ask this question). So I'll get in touch with the repo owner
> and
> > > see
> > > > > >>>> what
> > > > > >>>> his plans for the project are.
> > > > > >>>>
> > > > > >>>> -Jaikiran
> > > > > >>>>
> > > > > >>>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
> > > > > >>>>
> > > > > >>>>> I did!
> > > > > >>>>>
> > > > > >>>>> Thanks for clarifying :)
> > > > > >>>>>
> > > > > >>>>> The client that is part of Zookeeper itself actually does
> > support
> > > > > >>>>> timeouts.
> > > > > >>>>>
> > > > > >>>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <
> > > wangguoz@gmail.com>
> > > > > >>>>> wrote:
> > > > > >>>>>
> > > > > >>>>>> Hi Jaikiran,
> > > > > >>>>>>
> > > > > >>>>>> I think Gwen was talking about contributing to ZkClient
> > project:
> > > > > >>>>>>
> > > > > >>>>>> https://github.com/sgroschupf/zkclient
> > > > > >>>>>>
> > > > > >>>>>> Guozhang
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <
> > > > > >>>>>> jai.forums2013@gmail.com>
> > > > > >>>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>  Hi Gwen,
> > > > > >>>>>>>
> > > > > >>>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and
> not a
> > > > > >>>>>>> complete
> > > > > >>>>>>> replacement.
> > > > > >>>>>>>
> > > > > >>>>>>> As for contributing to Zookeeper, yes that indeed in on my
> > > mind,
> > > > but
> > > > > >>>>>>> I
> > > > > >>>>>>> haven't yet had a chance to really look deeper into
> Zookeeper
> > > or
> > > > get
> > > > > >>>>>>> in
> > > > > >>>>>>> touch with their dev team to try and explain this potential
> > > > > >>>>>>> improvement
> > > > > >>>>>>> to
> > > > > >>>>>>> them. I have no objection to contributing this or something
> > > > similar
> > > > > >>>>>>> to
> > > > > >>>>>>> Zookeeper directly. I think I should be able to bring this
> up
> > > in
> > > > the
> > > > > >>>>>>> Zookeeper dev forum, sometime soon in the next few
> weekends.
> > > > > >>>>>>>
> > > > > >>>>>>> -Jaikiran
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>  It looks like the new KafkaZkClient is a wrapper around
> > > > ZkClient,
> > > > > >>>>>>>> but
> > > > > >>>>>>>> not a replacement. Did I get it right?
> > > > > >>>>>>>>
> > > > > >>>>>>>> I think a wrapper for ZkClient can be useful - for example
> > > > > >>>>>>>> KAFKA-1664
> > > > > >>>>>>>> can also use one.
> > > > > >>>>>>>>
> > > > > >>>>>>>> However, I'm wondering why not contribute the fix directly
> > to
> > > > > >>>>>>>> ZKClient
> > > > > >>>>>>>> project and ask for a release that contains the fix?
> > > > > >>>>>>>> This will benefit other users of the project who may also
> > > need a
> > > > > >>>>>>>> timeout (thats pretty basic...)
> > > > > >>>>>>>>
> > > > > >>>>>>>> As an alternative, if we don't want to collaborate with
> > > > ZKClient for
> > > > > >>>>>>>> some reason, forking the project into Kafka will probably
> > give
> > > > us
> > > > > >>>>>>>> more
> > > > > >>>>>>>> control than wrappers and without much downside.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Just a thought.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Gwen
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
> > > > > >>>>>>>> <ja...@gmail.com>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>  Neha, Ewen (and others), my initial attempt to solve this
> > is
> > > > > >>>>>>>>> uploaded
> > > > > >>>>>>>>> here
> > > > > >>>>>>>>> https://reviews.apache.org/r/30477/. It solves the
> > shutdown
> > > > > >>>>>>>>> problem
> > > > > >>>>>>>>> and
> > > > > >>>>>>>>> now
> > > > > >>>>>>>>> the server shuts down even when Zookeeper has gone down
> > > before
> > > > the
> > > > > >>>>>>>>> Kafka
> > > > > >>>>>>>>> server.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> I went with the approach of introducing a custom
> (enhanced)
> > > > > >>>>>>>>> ZkClient
> > > > > >>>>>>>>> which
> > > > > >>>>>>>>> for now allows time outs to be optionally specified for
> > > certain
> > > > > >>>>>>>>> operations.
> > > > > >>>>>>>>> I intentionally haven't forced the use of this new
> > > > KafkaZkClient
> > > > > >>>>>>>>> all
> > > > > >>>>>>>>> over
> > > > > >>>>>>>>> the code and instead for now have just used it in the
> > > > KafkaServer.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Does this patch look like something worth using?
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> -Jaikiran
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede
> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>  Ewen is right. ZkClient APIs are blocking and the right
> > fix
> > > > for
> > > > > >>>>>>>>>> this
> > > > > >>>>>>>>>> seems
> > > > > >>>>>>>>>> to be patching ZkClient. At some point, if we find
> > ourselves
> > > > > >>>>>>>>>> fiddling
> > > > > >>>>>>>>>> too
> > > > > >>>>>>>>>> much with ZkClient, it wouldn't hurt to write our own
> > little
> > > > > >>>>>>>>>> zookeeper
> > > > > >>>>>>>>>> client wrapper.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
> > > > > >>>>>>>>>> <ew...@confluent.io>
> > > > > >>>>>>>>>> wrote:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>    Looks like a bug to me -- the underlying ZK library
> > > wraps a
> > > > > >>>>>>>>>> lot of
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> blocking
> > > > > >>>>>>>>>>> method implementations with waitUntilConnected() calls
> > > > without
> > > > > >>>>>>>>>>> any
> > > > > >>>>>>>>>>> timeouts. Ideally we could just add a version of
> > > > > >>>>>>>>>>> ZkUtils.getController()
> > > > > >>>>>>>>>>> with a timeout, but I don't see an easy way to
> accomplish
> > > > that
> > > > > >>>>>>>>>>> with
> > > > > >>>>>>>>>>> ZkClient.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> There's at least one other call to ZkUtils besides the
> > one
> > > > in the
> > > > > >>>>>>>>>>> stacktrace you gave that would cause the same issue,
> > > possibly
> > > > > >>>>>>>>>>> more
> > > > > >>>>>>>>>>> that
> > > > > >>>>>>>>>>> aren't directly called in that method. One ugly
> solution
> > > > would be
> > > > > >>>>>>>>>>> to
> > > > > >>>>>>>>>>> use
> > > > > >>>>>>>>>>> an
> > > > > >>>>>>>>>>> extra thread during shutdown to trigger timeouts, but
> I'd
> > > > imagine
> > > > > >>>>>>>>>>> we
> > > > > >>>>>>>>>>> probably have other threads that could end up blocking
> in
> > > > similar
> > > > > >>>>>>>>>>> ways.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> I filed
> https://issues.apache.org/jira/browse/KAFKA-1907
> > > to
> > > > > >>>>>>>>>>> track
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>> issue.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
> > > > > >>>>>>>>>>> jai.forums2013@gmail.com>
> > > > > >>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>    The main culprit is this thread which goes into
> > "forever
> > > > retry
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> connection
> > > > > >>>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a
> > Ctrl +
> > > > C)
> > > > > >>>>>>>>>>>> after
> > > > > >>>>>>>>>>>> zookeeper has already been shutdown. I have attached
> the
> > > > > >>>>>>>>>>>> complete
> > > > > >>>>>>>>>>>> thread
> > > > > >>>>>>>>>>>> dump, but I don't know if it will be delivered to the
> > > > mailing
> > > > > >>>>>>>>>>>> list.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting
> on
> > > > > >>>>>>>>>>>> condition
> > > > > >>>>>>>>>>>> [0x6ad69000]
> > > > > >>>>>>>>>>>>        java.lang.Thread.State: TIMED_WAITING (parking)
> > > > > >>>>>>>>>>>>         at sun.misc.Unsafe.park(Native Method)
> > > > > >>>>>>>>>>>>         - parking to wait for  <0x70a93368> (a
> > > > > >>>>>>>>>>>> java.util.concurrent.locks.
> > > > > >>>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
> > > > > >>>>>>>>>>>>         at
> > > java.util.concurrent.locks.LockSupport.parkUntil(
> > > > > >>>>>>>>>>>> LockSupport.java:267)
> > > > > >>>>>>>>>>>>         at java.util.concurrent.locks.
> > > > > >>>>>>>>>>>> AbstractQueuedSynchronizer$
> > > > > >>>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.
> > > > > >>>>>>>>>>>> java:2130)
> > > > > >>>>>>>>>>>>         at
> > > > > >>>>>>>>>>>>
> > org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.
> > > > > >>>>>>>>>>>> java:636)
> > > > > >>>>>>>>>>>>         at
> > > > > >>>>>>>>>>>>
> > org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
> > > > > >>>>>>>>>>>> java:619)
> > > > > >>>>>>>>>>>>         at
> > > > > >>>>>>>>>>>>
> > org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
> > > > > >>>>>>>>>>>> java:615)
> > > > > >>>>>>>>>>>>         at
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
> > > > > >>>>>>>>>>> java:679)
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>          at org.I0Itec.zkclient.ZkClient.
> > > > > >>>>>>>>>>>> readData(ZkClient.java:766)
> > > > > >>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.
> > > > > >>>>>>>>>>>> readData(ZkClient.java:761)
> > > > > >>>>>>>>>>>>         at
> > > > > >>>>>>>>>>>>
> > kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
> > > > > >>>>>>>>>>>>         at
> > > > kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
> > > > > >>>>>>>>>>>>         at
> > > > kafka.server.KafkaServer.kafka$server$KafkaServer$$
> > > > > >>>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
> > > > > >>>>>>>>>>>>         at kafka.server.KafkaServer$$
> > > > > >>>>>>>>>>>> anonfun$shutdown$1.apply$mcV$
> > > > > >>>>>>>>>>>> sp(KafkaServer.scala:269)
> > > > > >>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:172)
> > > > > >>>>>>>>>>>>         at kafka.utils.Logging$class.
> > > > > >>>>>>>>>>>> swallowWarn(Logging.scala:92)
> > > > > >>>>>>>>>>>>         at
> > kafka.utils.Utils$.swallowWarn(Utils.scala:45)
> > > > > >>>>>>>>>>>>         at
> > > > kafka.utils.Logging$class.swallow(Logging.scala:94)
> > > > > >>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:45)
> > > > > >>>>>>>>>>>>         at
> > > > kafka.server.KafkaServer.shutdown(KafkaServer.scala:
> > > > > >>>>>>>>>>>> 269)
> > > > > >>>>>>>>>>>>         at kafka.server.KafkaServerStartable.shutdown(
> > > > > >>>>>>>>>>>> KafkaServerStartable.scala:42)
> > > > > >>>>>>>>>>>>         at kafka.Kafka$$anon$1.run(Kafka.scala:42)
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> -Jaikiran
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede
> wrote:
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>    For a clean shutdown, the broker tries to talk to
> the
> > > > > >>>>>>>>>>>> controller
> > > > > >>>>>>>>>>>> and
> > > > > >>>>>>>>>>>> also
> > > > > >>>>>>>>>>>> issues reads to zookeeper. Possibly that is where it
> > tries
> > > > to
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> reconnect
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>  to
> > > > > >>>>>>>>>>>> zk. It will help to look at the thread dump.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Thanks
> > > > > >>>>>>>>>>>>> Neha
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
> > > > > >>>>>>>>>>>>> jai.forums2013@gmail.com
> > > > > >>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>      I was just playing around with the RC2 of 0.8.2
> > and
> > > > > >>>>>>>>>>>>> noticed
> > > > > >>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>> if I
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>  shutdown zookeeper first I can't shutdown Kafka
> server
> > > at
> > > > all
> > > > > >>>>>>>>>>>>>> since
> > > > > >>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>> goes
> > > > > >>>>>>>>>>>>>> into a never ending attempt to reconnect with
> > zookeeper.
> > > > I had
> > > > > >>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>> kill
> > > > > >>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk
> too
> > > and
> > > > > >>>>>>>>>>>>>> there
> > > > > >>>>>>>>>>>>>> too I
> > > > > >>>>>>>>>>>>>> see
> > > > > >>>>>>>>>>>>>> the same issue. Should I file a JIRA for this and
> see
> > if
> > > > I can
> > > > > >>>>>>>>>>>>>> come
> > > > > >>>>>>>>>>>>>> up
> > > > > >>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>> a patch?
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent)
> > > attempts
> > > > at
> > > > > >>>>>>>>>>>>>> trying
> > > > > >>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>> reconnect. I've a thread dump too which shows that
> the
> > > > other
> > > > > >>>>>>>>>>>>>> thread
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>  which
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>> is trying to complete a controlled shutdown of Kafka
> is
> > > > blocked
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> forever
> > > > > >>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session
> > 0x14b1a4136800000
> > > > for
> > > > > >>>>>>>>>>>>>> server
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>  null,
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>> unexpected error, closing socket connection and
> > attempting
> > > > > >>>>>>>>>>>> reconnect
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > > > > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > > > > >>>>>>>>>>>>>>          at
> > > > sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > > > > >>>>>>>>>>>>>> Method)
> > > > > >>>>>>>>>>>>>>          at
> > sun.nio.ch.SocketChannelImpl.finishConnect(
> > > > > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > > > > >>>>>>>>>>>>>>          at
> org.apache.zookeeper.ClientCnxnSocketNIO.
> > > > > >>>>>>>>>>>>>> doTransport(
> > > > > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > > > > >>>>>>>>>>>>>>          at
> > > > org.apache.zookeeper.ClientCnxn$SendThread.run(
> > > > > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > > > > >>>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket
> > connection
> > > > to
> > > > > >>>>>>>>>>>>>> server
> > > > > >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
> > > > authenticate
> > > > > >>>>>>>>>>>>>> using
> > > > > >>>>>>>>>>>>>> SASL
> > > > > >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > > > >>>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session
> > 0x14b1a4136800000
> > > > for
> > > > > >>>>>>>>>>>>>> server
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>  null,
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>> unexpected error, closing socket connection and
> > attempting
> > > > > >>>>>>>>>>>> reconnect
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > > > > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > > > > >>>>>>>>>>>>>>          at
> > > > sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > > > > >>>>>>>>>>>>>> Method)
> > > > > >>>>>>>>>>>>>>          at
> > sun.nio.ch.SocketChannelImpl.finishConnect(
> > > > > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > > > > >>>>>>>>>>>>>>          at
> org.apache.zookeeper.ClientCnxnSocketNIO.
> > > > > >>>>>>>>>>>>>> doTransport(
> > > > > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > > > > >>>>>>>>>>>>>>          at
> > > > org.apache.zookeeper.ClientCnxn$SendThread.run(
> > > > > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > > > > >>>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket
> > connection
> > > > to
> > > > > >>>>>>>>>>>>>> server
> > > > > >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
> > > > authenticate
> > > > > >>>>>>>>>>>>>> using
> > > > > >>>>>>>>>>>>>> SASL
> > > > > >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > > > >>>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session
> > 0x14b1a4136800000
> > > > for
> > > > > >>>>>>>>>>>>>> server
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>  null,
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>> unexpected error, closing socket connection and
> > attempting
> > > > > >>>>>>>>>>>> reconnect
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > > > > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > > > > >>>>>>>>>>>>>>          at
> > > > sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > > > > >>>>>>>>>>>>>> Method)
> > > > > >>>>>>>>>>>>>>          at
> > sun.nio.ch.SocketChannelImpl.finishConnect(
> > > > > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > > > > >>>>>>>>>>>>>>          at
> org.apache.zookeeper.ClientCnxnSocketNIO.
> > > > > >>>>>>>>>>>>>> doTransport(
> > > > > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > > > > >>>>>>>>>>>>>>          at
> > > > org.apache.zookeeper.ClientCnxn$SendThread.run(
> > > > > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > > > > >>>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket
> > connection
> > > > to
> > > > > >>>>>>>>>>>>>> server
> > > > > >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
> > > > authenticate
> > > > > >>>>>>>>>>>>>> using
> > > > > >>>>>>>>>>>>>> SASL
> > > > > >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > > > >>>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session
> > 0x14b1a4136800000
> > > > for
> > > > > >>>>>>>>>>>>>> server
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>  null,
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>> unexpected error, closing socket connection and
> > attempting
> > > > > >>>>>>>>>>>> reconnect
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > > > > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > > > > >>>>>>>>>>>>>>          at
> > > > sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > > > > >>>>>>>>>>>>>> Method)
> > > > > >>>>>>>>>>>>>>          at
> > sun.nio.ch.SocketChannelImpl.finishConnect(
> > > > > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > > > > >>>>>>>>>>>>>>          at
> org.apache.zookeeper.ClientCnxnSocketNIO.
> > > > > >>>>>>>>>>>>>> doTransport(
> > > > > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > > > > >>>>>>>>>>>>>>          at
> > > > org.apache.zookeeper.ClientCnxn$SendThread.run(
> > > > > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> -Jaikiran
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>    --
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Thanks,
> > > > > >>>>>>>>>>> Ewen
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>  --
> > > > > >>>>>> -- Guozhang
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > >
> > >
> >
> >
> >
> > --
> >
> > Regards,
> > Ashish
> >
>
>
>
> --
> -- Guozhang
>

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Guozhang Wang <wa...@gmail.com>.
Now may be a good time.

We could verify if Curator has fixed the known issues we have seen so far,
an incomplete list would be:

KAFKA-1082 <https://issues.apache.org/jira/browse/KAFKA-1082>
KAFKA-1155 <https://issues.apache.org/jira/browse/KAFKA-1155>
KAFKA-1907 <https://issues.apache.org/jira/browse/KAFKA-1907>
KAFKA-992 <https://issues.apache.org/jira/browse/KAFKA-992>



Guozhang

On Tue, Feb 3, 2015 at 10:21 PM, Ashish Singh <as...@cloudera.com> wrote:

> +1 on using curator.
>
> On Tue, Feb 3, 2015 at 10:09 PM, Manikumar Reddy <ku...@nmsworks.co.in>
> wrote:
>
> > I think we should consider to moving to  apache curator (KAFKA-873).
> > Curator is now more mature and a apache top-level project.
> >
> >
> > On Wed, Feb 4, 2015 at 11:29 AM, Harsha <ka...@harsha.io> wrote:
> >
> > > Any reason not to go with apache curator http://curator.apache.org/ .
> > > -Harsha
> > > On Tue, Feb 3, 2015, at 09:55 PM, Guozhang Wang wrote:
> > > > I am also +1 on Neha's suggestion that "At some point, if we find
> > > > ourselves
> > > > fiddling too much with ZkClient, it wouldn't hurt to write our own
> > little
> > > > zookeeper client wrapper." since we have accumulated a bunch of
> issues
> > > > with
> > > > zkClient which takes long time be resolved if ever, so we ended up
> have
> > > > some hacky way handling zkClient errors.
> > > >
> > > > Guozhang
> > > >
> > > > On Tue, Feb 3, 2015 at 7:47 PM, Jaikiran Pai <
> jai.forums2013@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Yes, that's the plan :)
> > > > >
> > > > > -Jaikiran
> > > > >
> > > > > On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:
> > > > >
> > > > >> So I think the current plan is:
> > > > >> 1. Add timeout in zkclient
> > > > >> 2. Ask zkclient to release new version (we need it for few other
> > > things
> > > > >> too)
> > > > >> 3. Rebase on new zkclient
> > > > >> 4. Fix this jira and the few others than were waiting for the new
> > > zkclient
> > > > >>
> > > > >> Does that make sense?
> > > > >>
> > > > >> Gwen
> > > > >>
> > > > >> On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <
> > > jai.forums2013@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>> I just heard back from Stefan, who manages the ZkClient repo and
> he
> > > > >>> seems to
> > > > >>> be open to have these changes be part of ZkClient project. I'll
> be
> > > > >>> creating
> > > > >>> a pull request for that project to have it reviewed and merged.
> > > Although
> > > > >>> I
> > > > >>> haven't heard of exact release plans, Stefan's reply did indicate
> > > that
> > > > >>> the
> > > > >>> project could be released after this change is merged.
> > > > >>>
> > > > >>> -Jaikiran
> > > > >>>
> > > > >>> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
> > > > >>>
> > > > >>>> Thanks for pointing to that repo!
> > > > >>>>
> > > > >>>> I just had a look at it and it appears that the project isn't
> much
> > > > >>>> active
> > > > >>>> (going by the lack of activity). The latest contribution is from
> > > Gwen
> > > > >>>> and
> > > > >>>> that was around 3 months back. I haven't found release plans for
> > > that
> > > > >>>> project or a place to ask about it (filing an issue doesn't seem
> > > right
> > > > >>>> to
> > > > >>>> ask this question). So I'll get in touch with the repo owner and
> > see
> > > > >>>> what
> > > > >>>> his plans for the project are.
> > > > >>>>
> > > > >>>> -Jaikiran
> > > > >>>>
> > > > >>>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
> > > > >>>>
> > > > >>>>> I did!
> > > > >>>>>
> > > > >>>>> Thanks for clarifying :)
> > > > >>>>>
> > > > >>>>> The client that is part of Zookeeper itself actually does
> support
> > > > >>>>> timeouts.
> > > > >>>>>
> > > > >>>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <
> > wangguoz@gmail.com>
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>> Hi Jaikiran,
> > > > >>>>>>
> > > > >>>>>> I think Gwen was talking about contributing to ZkClient
> project:
> > > > >>>>>>
> > > > >>>>>> https://github.com/sgroschupf/zkclient
> > > > >>>>>>
> > > > >>>>>> Guozhang
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <
> > > > >>>>>> jai.forums2013@gmail.com>
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>  Hi Gwen,
> > > > >>>>>>>
> > > > >>>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a
> > > > >>>>>>> complete
> > > > >>>>>>> replacement.
> > > > >>>>>>>
> > > > >>>>>>> As for contributing to Zookeeper, yes that indeed in on my
> > mind,
> > > but
> > > > >>>>>>> I
> > > > >>>>>>> haven't yet had a chance to really look deeper into Zookeeper
> > or
> > > get
> > > > >>>>>>> in
> > > > >>>>>>> touch with their dev team to try and explain this potential
> > > > >>>>>>> improvement
> > > > >>>>>>> to
> > > > >>>>>>> them. I have no objection to contributing this or something
> > > similar
> > > > >>>>>>> to
> > > > >>>>>>> Zookeeper directly. I think I should be able to bring this up
> > in
> > > the
> > > > >>>>>>> Zookeeper dev forum, sometime soon in the next few weekends.
> > > > >>>>>>>
> > > > >>>>>>> -Jaikiran
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
> > > > >>>>>>>
> > > > >>>>>>>  It looks like the new KafkaZkClient is a wrapper around
> > > ZkClient,
> > > > >>>>>>>> but
> > > > >>>>>>>> not a replacement. Did I get it right?
> > > > >>>>>>>>
> > > > >>>>>>>> I think a wrapper for ZkClient can be useful - for example
> > > > >>>>>>>> KAFKA-1664
> > > > >>>>>>>> can also use one.
> > > > >>>>>>>>
> > > > >>>>>>>> However, I'm wondering why not contribute the fix directly
> to
> > > > >>>>>>>> ZKClient
> > > > >>>>>>>> project and ask for a release that contains the fix?
> > > > >>>>>>>> This will benefit other users of the project who may also
> > need a
> > > > >>>>>>>> timeout (thats pretty basic...)
> > > > >>>>>>>>
> > > > >>>>>>>> As an alternative, if we don't want to collaborate with
> > > ZKClient for
> > > > >>>>>>>> some reason, forking the project into Kafka will probably
> give
> > > us
> > > > >>>>>>>> more
> > > > >>>>>>>> control than wrappers and without much downside.
> > > > >>>>>>>>
> > > > >>>>>>>> Just a thought.
> > > > >>>>>>>>
> > > > >>>>>>>> Gwen
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
> > > > >>>>>>>> <ja...@gmail.com>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>  Neha, Ewen (and others), my initial attempt to solve this
> is
> > > > >>>>>>>>> uploaded
> > > > >>>>>>>>> here
> > > > >>>>>>>>> https://reviews.apache.org/r/30477/. It solves the
> shutdown
> > > > >>>>>>>>> problem
> > > > >>>>>>>>> and
> > > > >>>>>>>>> now
> > > > >>>>>>>>> the server shuts down even when Zookeeper has gone down
> > before
> > > the
> > > > >>>>>>>>> Kafka
> > > > >>>>>>>>> server.
> > > > >>>>>>>>>
> > > > >>>>>>>>> I went with the approach of introducing a custom (enhanced)
> > > > >>>>>>>>> ZkClient
> > > > >>>>>>>>> which
> > > > >>>>>>>>> for now allows time outs to be optionally specified for
> > certain
> > > > >>>>>>>>> operations.
> > > > >>>>>>>>> I intentionally haven't forced the use of this new
> > > KafkaZkClient
> > > > >>>>>>>>> all
> > > > >>>>>>>>> over
> > > > >>>>>>>>> the code and instead for now have just used it in the
> > > KafkaServer.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Does this patch look like something worth using?
> > > > >>>>>>>>>
> > > > >>>>>>>>> -Jaikiran
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>  Ewen is right. ZkClient APIs are blocking and the right
> fix
> > > for
> > > > >>>>>>>>>> this
> > > > >>>>>>>>>> seems
> > > > >>>>>>>>>> to be patching ZkClient. At some point, if we find
> ourselves
> > > > >>>>>>>>>> fiddling
> > > > >>>>>>>>>> too
> > > > >>>>>>>>>> much with ZkClient, it wouldn't hurt to write our own
> little
> > > > >>>>>>>>>> zookeeper
> > > > >>>>>>>>>> client wrapper.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
> > > > >>>>>>>>>> <ew...@confluent.io>
> > > > >>>>>>>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>    Looks like a bug to me -- the underlying ZK library
> > wraps a
> > > > >>>>>>>>>> lot of
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> blocking
> > > > >>>>>>>>>>> method implementations with waitUntilConnected() calls
> > > without
> > > > >>>>>>>>>>> any
> > > > >>>>>>>>>>> timeouts. Ideally we could just add a version of
> > > > >>>>>>>>>>> ZkUtils.getController()
> > > > >>>>>>>>>>> with a timeout, but I don't see an easy way to accomplish
> > > that
> > > > >>>>>>>>>>> with
> > > > >>>>>>>>>>> ZkClient.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> There's at least one other call to ZkUtils besides the
> one
> > > in the
> > > > >>>>>>>>>>> stacktrace you gave that would cause the same issue,
> > possibly
> > > > >>>>>>>>>>> more
> > > > >>>>>>>>>>> that
> > > > >>>>>>>>>>> aren't directly called in that method. One ugly solution
> > > would be
> > > > >>>>>>>>>>> to
> > > > >>>>>>>>>>> use
> > > > >>>>>>>>>>> an
> > > > >>>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd
> > > imagine
> > > > >>>>>>>>>>> we
> > > > >>>>>>>>>>> probably have other threads that could end up blocking in
> > > similar
> > > > >>>>>>>>>>> ways.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907
> > to
> > > > >>>>>>>>>>> track
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>> issue.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
> > > > >>>>>>>>>>> jai.forums2013@gmail.com>
> > > > >>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>    The main culprit is this thread which goes into
> "forever
> > > retry
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> connection
> > > > >>>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a
> Ctrl +
> > > C)
> > > > >>>>>>>>>>>> after
> > > > >>>>>>>>>>>> zookeeper has already been shutdown. I have attached the
> > > > >>>>>>>>>>>> complete
> > > > >>>>>>>>>>>> thread
> > > > >>>>>>>>>>>> dump, but I don't know if it will be delivered to the
> > > mailing
> > > > >>>>>>>>>>>> list.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on
> > > > >>>>>>>>>>>> condition
> > > > >>>>>>>>>>>> [0x6ad69000]
> > > > >>>>>>>>>>>>        java.lang.Thread.State: TIMED_WAITING (parking)
> > > > >>>>>>>>>>>>         at sun.misc.Unsafe.park(Native Method)
> > > > >>>>>>>>>>>>         - parking to wait for  <0x70a93368> (a
> > > > >>>>>>>>>>>> java.util.concurrent.locks.
> > > > >>>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
> > > > >>>>>>>>>>>>         at
> > java.util.concurrent.locks.LockSupport.parkUntil(
> > > > >>>>>>>>>>>> LockSupport.java:267)
> > > > >>>>>>>>>>>>         at java.util.concurrent.locks.
> > > > >>>>>>>>>>>> AbstractQueuedSynchronizer$
> > > > >>>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.
> > > > >>>>>>>>>>>> java:2130)
> > > > >>>>>>>>>>>>         at
> > > > >>>>>>>>>>>>
> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.
> > > > >>>>>>>>>>>> java:636)
> > > > >>>>>>>>>>>>         at
> > > > >>>>>>>>>>>>
> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
> > > > >>>>>>>>>>>> java:619)
> > > > >>>>>>>>>>>>         at
> > > > >>>>>>>>>>>>
> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
> > > > >>>>>>>>>>>> java:615)
> > > > >>>>>>>>>>>>         at
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
> > > > >>>>>>>>>>> java:679)
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>          at org.I0Itec.zkclient.ZkClient.
> > > > >>>>>>>>>>>> readData(ZkClient.java:766)
> > > > >>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.
> > > > >>>>>>>>>>>> readData(ZkClient.java:761)
> > > > >>>>>>>>>>>>         at
> > > > >>>>>>>>>>>>
> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
> > > > >>>>>>>>>>>>         at
> > > kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
> > > > >>>>>>>>>>>>         at
> > > kafka.server.KafkaServer.kafka$server$KafkaServer$$
> > > > >>>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
> > > > >>>>>>>>>>>>         at kafka.server.KafkaServer$$
> > > > >>>>>>>>>>>> anonfun$shutdown$1.apply$mcV$
> > > > >>>>>>>>>>>> sp(KafkaServer.scala:269)
> > > > >>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:172)
> > > > >>>>>>>>>>>>         at kafka.utils.Logging$class.
> > > > >>>>>>>>>>>> swallowWarn(Logging.scala:92)
> > > > >>>>>>>>>>>>         at
> kafka.utils.Utils$.swallowWarn(Utils.scala:45)
> > > > >>>>>>>>>>>>         at
> > > kafka.utils.Logging$class.swallow(Logging.scala:94)
> > > > >>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:45)
> > > > >>>>>>>>>>>>         at
> > > kafka.server.KafkaServer.shutdown(KafkaServer.scala:
> > > > >>>>>>>>>>>> 269)
> > > > >>>>>>>>>>>>         at kafka.server.KafkaServerStartable.shutdown(
> > > > >>>>>>>>>>>> KafkaServerStartable.scala:42)
> > > > >>>>>>>>>>>>         at kafka.Kafka$$anon$1.run(Kafka.scala:42)
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> -Jaikiran
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>    For a clean shutdown, the broker tries to talk to the
> > > > >>>>>>>>>>>> controller
> > > > >>>>>>>>>>>> and
> > > > >>>>>>>>>>>> also
> > > > >>>>>>>>>>>> issues reads to zookeeper. Possibly that is where it
> tries
> > > to
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> reconnect
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>  to
> > > > >>>>>>>>>>>> zk. It will help to look at the thread dump.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Thanks
> > > > >>>>>>>>>>>>> Neha
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
> > > > >>>>>>>>>>>>> jai.forums2013@gmail.com
> > > > >>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>      I was just playing around with the RC2 of 0.8.2
> and
> > > > >>>>>>>>>>>>> noticed
> > > > >>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>> if I
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>  shutdown zookeeper first I can't shutdown Kafka server
> > at
> > > all
> > > > >>>>>>>>>>>>>> since
> > > > >>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>> goes
> > > > >>>>>>>>>>>>>> into a never ending attempt to reconnect with
> zookeeper.
> > > I had
> > > > >>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>> kill
> > > > >>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too
> > and
> > > > >>>>>>>>>>>>>> there
> > > > >>>>>>>>>>>>>> too I
> > > > >>>>>>>>>>>>>> see
> > > > >>>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see
> if
> > > I can
> > > > >>>>>>>>>>>>>> come
> > > > >>>>>>>>>>>>>> up
> > > > >>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>> a patch?
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent)
> > attempts
> > > at
> > > > >>>>>>>>>>>>>> trying
> > > > >>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the
> > > other
> > > > >>>>>>>>>>>>>> thread
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>  which
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is
> > > blocked
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> forever
> > > > >>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session
> 0x14b1a4136800000
> > > for
> > > > >>>>>>>>>>>>>> server
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>  null,
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>> unexpected error, closing socket connection and
> attempting
> > > > >>>>>>>>>>>> reconnect
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > > > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > > > >>>>>>>>>>>>>>          at
> > > sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > > > >>>>>>>>>>>>>> Method)
> > > > >>>>>>>>>>>>>>          at
> sun.nio.ch.SocketChannelImpl.finishConnect(
> > > > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > > > >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> > > > >>>>>>>>>>>>>> doTransport(
> > > > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > > > >>>>>>>>>>>>>>          at
> > > org.apache.zookeeper.ClientCnxn$SendThread.run(
> > > > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > > > >>>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket
> connection
> > > to
> > > > >>>>>>>>>>>>>> server
> > > > >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
> > > authenticate
> > > > >>>>>>>>>>>>>> using
> > > > >>>>>>>>>>>>>> SASL
> > > > >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > > >>>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session
> 0x14b1a4136800000
> > > for
> > > > >>>>>>>>>>>>>> server
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>  null,
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>> unexpected error, closing socket connection and
> attempting
> > > > >>>>>>>>>>>> reconnect
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > > > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > > > >>>>>>>>>>>>>>          at
> > > sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > > > >>>>>>>>>>>>>> Method)
> > > > >>>>>>>>>>>>>>          at
> sun.nio.ch.SocketChannelImpl.finishConnect(
> > > > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > > > >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> > > > >>>>>>>>>>>>>> doTransport(
> > > > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > > > >>>>>>>>>>>>>>          at
> > > org.apache.zookeeper.ClientCnxn$SendThread.run(
> > > > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > > > >>>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket
> connection
> > > to
> > > > >>>>>>>>>>>>>> server
> > > > >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
> > > authenticate
> > > > >>>>>>>>>>>>>> using
> > > > >>>>>>>>>>>>>> SASL
> > > > >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > > >>>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session
> 0x14b1a4136800000
> > > for
> > > > >>>>>>>>>>>>>> server
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>  null,
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>> unexpected error, closing socket connection and
> attempting
> > > > >>>>>>>>>>>> reconnect
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > > > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > > > >>>>>>>>>>>>>>          at
> > > sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > > > >>>>>>>>>>>>>> Method)
> > > > >>>>>>>>>>>>>>          at
> sun.nio.ch.SocketChannelImpl.finishConnect(
> > > > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > > > >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> > > > >>>>>>>>>>>>>> doTransport(
> > > > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > > > >>>>>>>>>>>>>>          at
> > > org.apache.zookeeper.ClientCnxn$SendThread.run(
> > > > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > > > >>>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket
> connection
> > > to
> > > > >>>>>>>>>>>>>> server
> > > > >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
> > > authenticate
> > > > >>>>>>>>>>>>>> using
> > > > >>>>>>>>>>>>>> SASL
> > > > >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > > >>>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session
> 0x14b1a4136800000
> > > for
> > > > >>>>>>>>>>>>>> server
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>  null,
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>> unexpected error, closing socket connection and
> attempting
> > > > >>>>>>>>>>>> reconnect
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > > > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > > > >>>>>>>>>>>>>>          at
> > > sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > > > >>>>>>>>>>>>>> Method)
> > > > >>>>>>>>>>>>>>          at
> sun.nio.ch.SocketChannelImpl.finishConnect(
> > > > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > > > >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> > > > >>>>>>>>>>>>>> doTransport(
> > > > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > > > >>>>>>>>>>>>>>          at
> > > org.apache.zookeeper.ClientCnxn$SendThread.run(
> > > > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> -Jaikiran
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>    --
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Thanks,
> > > > >>>>>>>>>>> Ewen
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>  --
> > > > >>>>>> -- Guozhang
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > >
> >
>
>
>
> --
>
> Regards,
> Ashish
>



-- 
-- Guozhang

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Ashish Singh <as...@cloudera.com>.
+1 on using curator.

On Tue, Feb 3, 2015 at 10:09 PM, Manikumar Reddy <ku...@nmsworks.co.in>
wrote:

> I think we should consider to moving to  apache curator (KAFKA-873).
> Curator is now more mature and a apache top-level project.
>
>
> On Wed, Feb 4, 2015 at 11:29 AM, Harsha <ka...@harsha.io> wrote:
>
> > Any reason not to go with apache curator http://curator.apache.org/ .
> > -Harsha
> > On Tue, Feb 3, 2015, at 09:55 PM, Guozhang Wang wrote:
> > > I am also +1 on Neha's suggestion that "At some point, if we find
> > > ourselves
> > > fiddling too much with ZkClient, it wouldn't hurt to write our own
> little
> > > zookeeper client wrapper." since we have accumulated a bunch of issues
> > > with
> > > zkClient which takes long time be resolved if ever, so we ended up have
> > > some hacky way handling zkClient errors.
> > >
> > > Guozhang
> > >
> > > On Tue, Feb 3, 2015 at 7:47 PM, Jaikiran Pai <jai.forums2013@gmail.com
> >
> > > wrote:
> > >
> > > > Yes, that's the plan :)
> > > >
> > > > -Jaikiran
> > > >
> > > > On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:
> > > >
> > > >> So I think the current plan is:
> > > >> 1. Add timeout in zkclient
> > > >> 2. Ask zkclient to release new version (we need it for few other
> > things
> > > >> too)
> > > >> 3. Rebase on new zkclient
> > > >> 4. Fix this jira and the few others than were waiting for the new
> > zkclient
> > > >>
> > > >> Does that make sense?
> > > >>
> > > >> Gwen
> > > >>
> > > >> On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <
> > jai.forums2013@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> I just heard back from Stefan, who manages the ZkClient repo and he
> > > >>> seems to
> > > >>> be open to have these changes be part of ZkClient project. I'll be
> > > >>> creating
> > > >>> a pull request for that project to have it reviewed and merged.
> > Although
> > > >>> I
> > > >>> haven't heard of exact release plans, Stefan's reply did indicate
> > that
> > > >>> the
> > > >>> project could be released after this change is merged.
> > > >>>
> > > >>> -Jaikiran
> > > >>>
> > > >>> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
> > > >>>
> > > >>>> Thanks for pointing to that repo!
> > > >>>>
> > > >>>> I just had a look at it and it appears that the project isn't much
> > > >>>> active
> > > >>>> (going by the lack of activity). The latest contribution is from
> > Gwen
> > > >>>> and
> > > >>>> that was around 3 months back. I haven't found release plans for
> > that
> > > >>>> project or a place to ask about it (filing an issue doesn't seem
> > right
> > > >>>> to
> > > >>>> ask this question). So I'll get in touch with the repo owner and
> see
> > > >>>> what
> > > >>>> his plans for the project are.
> > > >>>>
> > > >>>> -Jaikiran
> > > >>>>
> > > >>>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
> > > >>>>
> > > >>>>> I did!
> > > >>>>>
> > > >>>>> Thanks for clarifying :)
> > > >>>>>
> > > >>>>> The client that is part of Zookeeper itself actually does support
> > > >>>>> timeouts.
> > > >>>>>
> > > >>>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <
> wangguoz@gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi Jaikiran,
> > > >>>>>>
> > > >>>>>> I think Gwen was talking about contributing to ZkClient project:
> > > >>>>>>
> > > >>>>>> https://github.com/sgroschupf/zkclient
> > > >>>>>>
> > > >>>>>> Guozhang
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <
> > > >>>>>> jai.forums2013@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>  Hi Gwen,
> > > >>>>>>>
> > > >>>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a
> > > >>>>>>> complete
> > > >>>>>>> replacement.
> > > >>>>>>>
> > > >>>>>>> As for contributing to Zookeeper, yes that indeed in on my
> mind,
> > but
> > > >>>>>>> I
> > > >>>>>>> haven't yet had a chance to really look deeper into Zookeeper
> or
> > get
> > > >>>>>>> in
> > > >>>>>>> touch with their dev team to try and explain this potential
> > > >>>>>>> improvement
> > > >>>>>>> to
> > > >>>>>>> them. I have no objection to contributing this or something
> > similar
> > > >>>>>>> to
> > > >>>>>>> Zookeeper directly. I think I should be able to bring this up
> in
> > the
> > > >>>>>>> Zookeeper dev forum, sometime soon in the next few weekends.
> > > >>>>>>>
> > > >>>>>>> -Jaikiran
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
> > > >>>>>>>
> > > >>>>>>>  It looks like the new KafkaZkClient is a wrapper around
> > ZkClient,
> > > >>>>>>>> but
> > > >>>>>>>> not a replacement. Did I get it right?
> > > >>>>>>>>
> > > >>>>>>>> I think a wrapper for ZkClient can be useful - for example
> > > >>>>>>>> KAFKA-1664
> > > >>>>>>>> can also use one.
> > > >>>>>>>>
> > > >>>>>>>> However, I'm wondering why not contribute the fix directly to
> > > >>>>>>>> ZKClient
> > > >>>>>>>> project and ask for a release that contains the fix?
> > > >>>>>>>> This will benefit other users of the project who may also
> need a
> > > >>>>>>>> timeout (thats pretty basic...)
> > > >>>>>>>>
> > > >>>>>>>> As an alternative, if we don't want to collaborate with
> > ZKClient for
> > > >>>>>>>> some reason, forking the project into Kafka will probably give
> > us
> > > >>>>>>>> more
> > > >>>>>>>> control than wrappers and without much downside.
> > > >>>>>>>>
> > > >>>>>>>> Just a thought.
> > > >>>>>>>>
> > > >>>>>>>> Gwen
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
> > > >>>>>>>> <ja...@gmail.com>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>  Neha, Ewen (and others), my initial attempt to solve this is
> > > >>>>>>>>> uploaded
> > > >>>>>>>>> here
> > > >>>>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown
> > > >>>>>>>>> problem
> > > >>>>>>>>> and
> > > >>>>>>>>> now
> > > >>>>>>>>> the server shuts down even when Zookeeper has gone down
> before
> > the
> > > >>>>>>>>> Kafka
> > > >>>>>>>>> server.
> > > >>>>>>>>>
> > > >>>>>>>>> I went with the approach of introducing a custom (enhanced)
> > > >>>>>>>>> ZkClient
> > > >>>>>>>>> which
> > > >>>>>>>>> for now allows time outs to be optionally specified for
> certain
> > > >>>>>>>>> operations.
> > > >>>>>>>>> I intentionally haven't forced the use of this new
> > KafkaZkClient
> > > >>>>>>>>> all
> > > >>>>>>>>> over
> > > >>>>>>>>> the code and instead for now have just used it in the
> > KafkaServer.
> > > >>>>>>>>>
> > > >>>>>>>>> Does this patch look like something worth using?
> > > >>>>>>>>>
> > > >>>>>>>>> -Jaikiran
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>  Ewen is right. ZkClient APIs are blocking and the right fix
> > for
> > > >>>>>>>>>> this
> > > >>>>>>>>>> seems
> > > >>>>>>>>>> to be patching ZkClient. At some point, if we find ourselves
> > > >>>>>>>>>> fiddling
> > > >>>>>>>>>> too
> > > >>>>>>>>>> much with ZkClient, it wouldn't hurt to write our own little
> > > >>>>>>>>>> zookeeper
> > > >>>>>>>>>> client wrapper.
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
> > > >>>>>>>>>> <ew...@confluent.io>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>    Looks like a bug to me -- the underlying ZK library
> wraps a
> > > >>>>>>>>>> lot of
> > > >>>>>>>>>>
> > > >>>>>>>>>>> blocking
> > > >>>>>>>>>>> method implementations with waitUntilConnected() calls
> > without
> > > >>>>>>>>>>> any
> > > >>>>>>>>>>> timeouts. Ideally we could just add a version of
> > > >>>>>>>>>>> ZkUtils.getController()
> > > >>>>>>>>>>> with a timeout, but I don't see an easy way to accomplish
> > that
> > > >>>>>>>>>>> with
> > > >>>>>>>>>>> ZkClient.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> There's at least one other call to ZkUtils besides the one
> > in the
> > > >>>>>>>>>>> stacktrace you gave that would cause the same issue,
> possibly
> > > >>>>>>>>>>> more
> > > >>>>>>>>>>> that
> > > >>>>>>>>>>> aren't directly called in that method. One ugly solution
> > would be
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>> use
> > > >>>>>>>>>>> an
> > > >>>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd
> > imagine
> > > >>>>>>>>>>> we
> > > >>>>>>>>>>> probably have other threads that could end up blocking in
> > similar
> > > >>>>>>>>>>> ways.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907
> to
> > > >>>>>>>>>>> track
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>> issue.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
> > > >>>>>>>>>>> jai.forums2013@gmail.com>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>    The main culprit is this thread which goes into "forever
> > retry
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> connection
> > > >>>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl +
> > C)
> > > >>>>>>>>>>>> after
> > > >>>>>>>>>>>> zookeeper has already been shutdown. I have attached the
> > > >>>>>>>>>>>> complete
> > > >>>>>>>>>>>> thread
> > > >>>>>>>>>>>> dump, but I don't know if it will be delivered to the
> > mailing
> > > >>>>>>>>>>>> list.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on
> > > >>>>>>>>>>>> condition
> > > >>>>>>>>>>>> [0x6ad69000]
> > > >>>>>>>>>>>>        java.lang.Thread.State: TIMED_WAITING (parking)
> > > >>>>>>>>>>>>         at sun.misc.Unsafe.park(Native Method)
> > > >>>>>>>>>>>>         - parking to wait for  <0x70a93368> (a
> > > >>>>>>>>>>>> java.util.concurrent.locks.
> > > >>>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
> > > >>>>>>>>>>>>         at
> java.util.concurrent.locks.LockSupport.parkUntil(
> > > >>>>>>>>>>>> LockSupport.java:267)
> > > >>>>>>>>>>>>         at java.util.concurrent.locks.
> > > >>>>>>>>>>>> AbstractQueuedSynchronizer$
> > > >>>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.
> > > >>>>>>>>>>>> java:2130)
> > > >>>>>>>>>>>>         at
> > > >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.
> > > >>>>>>>>>>>> java:636)
> > > >>>>>>>>>>>>         at
> > > >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
> > > >>>>>>>>>>>> java:619)
> > > >>>>>>>>>>>>         at
> > > >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
> > > >>>>>>>>>>>> java:615)
> > > >>>>>>>>>>>>         at
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
> > > >>>>>>>>>>> java:679)
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>          at org.I0Itec.zkclient.ZkClient.
> > > >>>>>>>>>>>> readData(ZkClient.java:766)
> > > >>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.
> > > >>>>>>>>>>>> readData(ZkClient.java:761)
> > > >>>>>>>>>>>>         at
> > > >>>>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
> > > >>>>>>>>>>>>         at
> > kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
> > > >>>>>>>>>>>>         at
> > kafka.server.KafkaServer.kafka$server$KafkaServer$$
> > > >>>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
> > > >>>>>>>>>>>>         at kafka.server.KafkaServer$$
> > > >>>>>>>>>>>> anonfun$shutdown$1.apply$mcV$
> > > >>>>>>>>>>>> sp(KafkaServer.scala:269)
> > > >>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:172)
> > > >>>>>>>>>>>>         at kafka.utils.Logging$class.
> > > >>>>>>>>>>>> swallowWarn(Logging.scala:92)
> > > >>>>>>>>>>>>         at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
> > > >>>>>>>>>>>>         at
> > kafka.utils.Logging$class.swallow(Logging.scala:94)
> > > >>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:45)
> > > >>>>>>>>>>>>         at
> > kafka.server.KafkaServer.shutdown(KafkaServer.scala:
> > > >>>>>>>>>>>> 269)
> > > >>>>>>>>>>>>         at kafka.server.KafkaServerStartable.shutdown(
> > > >>>>>>>>>>>> KafkaServerStartable.scala:42)
> > > >>>>>>>>>>>>         at kafka.Kafka$$anon$1.run(Kafka.scala:42)
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> -Jaikiran
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>    For a clean shutdown, the broker tries to talk to the
> > > >>>>>>>>>>>> controller
> > > >>>>>>>>>>>> and
> > > >>>>>>>>>>>> also
> > > >>>>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries
> > to
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> reconnect
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>  to
> > > >>>>>>>>>>>> zk. It will help to look at the thread dump.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Thanks
> > > >>>>>>>>>>>>> Neha
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
> > > >>>>>>>>>>>>> jai.forums2013@gmail.com
> > > >>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>      I was just playing around with the RC2 of 0.8.2 and
> > > >>>>>>>>>>>>> noticed
> > > >>>>>>>>>>>>> that
> > > >>>>>>>>>>>>> if I
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>  shutdown zookeeper first I can't shutdown Kafka server
> at
> > all
> > > >>>>>>>>>>>>>> since
> > > >>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>> goes
> > > >>>>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper.
> > I had
> > > >>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>> kill
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too
> and
> > > >>>>>>>>>>>>>> there
> > > >>>>>>>>>>>>>> too I
> > > >>>>>>>>>>>>>> see
> > > >>>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if
> > I can
> > > >>>>>>>>>>>>>> come
> > > >>>>>>>>>>>>>> up
> > > >>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>> a patch?
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent)
> attempts
> > at
> > > >>>>>>>>>>>>>> trying
> > > >>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the
> > other
> > > >>>>>>>>>>>>>> thread
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>  which
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is
> > blocked
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> forever
> > > >>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000
> > for
> > > >>>>>>>>>>>>>> server
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>  null,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> > > >>>>>>>>>>>> reconnect
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > > >>>>>>>>>>>>>>          at
> > sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > > >>>>>>>>>>>>>> Method)
> > > >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> > > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > > >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> > > >>>>>>>>>>>>>> doTransport(
> > > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > > >>>>>>>>>>>>>>          at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(
> > > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > > >>>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection
> > to
> > > >>>>>>>>>>>>>> server
> > > >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
> > authenticate
> > > >>>>>>>>>>>>>> using
> > > >>>>>>>>>>>>>> SASL
> > > >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > >>>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000
> > for
> > > >>>>>>>>>>>>>> server
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>  null,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> > > >>>>>>>>>>>> reconnect
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > > >>>>>>>>>>>>>>          at
> > sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > > >>>>>>>>>>>>>> Method)
> > > >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> > > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > > >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> > > >>>>>>>>>>>>>> doTransport(
> > > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > > >>>>>>>>>>>>>>          at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(
> > > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > > >>>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection
> > to
> > > >>>>>>>>>>>>>> server
> > > >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
> > authenticate
> > > >>>>>>>>>>>>>> using
> > > >>>>>>>>>>>>>> SASL
> > > >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > >>>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000
> > for
> > > >>>>>>>>>>>>>> server
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>  null,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> > > >>>>>>>>>>>> reconnect
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > > >>>>>>>>>>>>>>          at
> > sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > > >>>>>>>>>>>>>> Method)
> > > >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> > > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > > >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> > > >>>>>>>>>>>>>> doTransport(
> > > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > > >>>>>>>>>>>>>>          at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(
> > > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > > >>>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection
> > to
> > > >>>>>>>>>>>>>> server
> > > >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
> > authenticate
> > > >>>>>>>>>>>>>> using
> > > >>>>>>>>>>>>>> SASL
> > > >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> > > >>>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000
> > for
> > > >>>>>>>>>>>>>> server
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>  null,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> > > >>>>>>>>>>>> reconnect
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > > >>>>>>>>>>>>>>          at
> > sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > > >>>>>>>>>>>>>> Method)
> > > >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> > > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > > >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> > > >>>>>>>>>>>>>> doTransport(
> > > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > > >>>>>>>>>>>>>>          at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(
> > > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> -Jaikiran
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>    --
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>> Ewen
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>  --
> > > >>>>>> -- Guozhang
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> >
>



-- 

Regards,
Ashish

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Manikumar Reddy <ku...@nmsworks.co.in>.
I think we should consider to moving to  apache curator (KAFKA-873).
Curator is now more mature and a apache top-level project.


On Wed, Feb 4, 2015 at 11:29 AM, Harsha <ka...@harsha.io> wrote:

> Any reason not to go with apache curator http://curator.apache.org/ .
> -Harsha
> On Tue, Feb 3, 2015, at 09:55 PM, Guozhang Wang wrote:
> > I am also +1 on Neha's suggestion that "At some point, if we find
> > ourselves
> > fiddling too much with ZkClient, it wouldn't hurt to write our own little
> > zookeeper client wrapper." since we have accumulated a bunch of issues
> > with
> > zkClient which takes long time be resolved if ever, so we ended up have
> > some hacky way handling zkClient errors.
> >
> > Guozhang
> >
> > On Tue, Feb 3, 2015 at 7:47 PM, Jaikiran Pai <ja...@gmail.com>
> > wrote:
> >
> > > Yes, that's the plan :)
> > >
> > > -Jaikiran
> > >
> > > On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:
> > >
> > >> So I think the current plan is:
> > >> 1. Add timeout in zkclient
> > >> 2. Ask zkclient to release new version (we need it for few other
> things
> > >> too)
> > >> 3. Rebase on new zkclient
> > >> 4. Fix this jira and the few others than were waiting for the new
> zkclient
> > >>
> > >> Does that make sense?
> > >>
> > >> Gwen
> > >>
> > >> On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <
> jai.forums2013@gmail.com>
> > >> wrote:
> > >>
> > >>> I just heard back from Stefan, who manages the ZkClient repo and he
> > >>> seems to
> > >>> be open to have these changes be part of ZkClient project. I'll be
> > >>> creating
> > >>> a pull request for that project to have it reviewed and merged.
> Although
> > >>> I
> > >>> haven't heard of exact release plans, Stefan's reply did indicate
> that
> > >>> the
> > >>> project could be released after this change is merged.
> > >>>
> > >>> -Jaikiran
> > >>>
> > >>> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
> > >>>
> > >>>> Thanks for pointing to that repo!
> > >>>>
> > >>>> I just had a look at it and it appears that the project isn't much
> > >>>> active
> > >>>> (going by the lack of activity). The latest contribution is from
> Gwen
> > >>>> and
> > >>>> that was around 3 months back. I haven't found release plans for
> that
> > >>>> project or a place to ask about it (filing an issue doesn't seem
> right
> > >>>> to
> > >>>> ask this question). So I'll get in touch with the repo owner and see
> > >>>> what
> > >>>> his plans for the project are.
> > >>>>
> > >>>> -Jaikiran
> > >>>>
> > >>>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
> > >>>>
> > >>>>> I did!
> > >>>>>
> > >>>>> Thanks for clarifying :)
> > >>>>>
> > >>>>> The client that is part of Zookeeper itself actually does support
> > >>>>> timeouts.
> > >>>>>
> > >>>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wa...@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi Jaikiran,
> > >>>>>>
> > >>>>>> I think Gwen was talking about contributing to ZkClient project:
> > >>>>>>
> > >>>>>> https://github.com/sgroschupf/zkclient
> > >>>>>>
> > >>>>>> Guozhang
> > >>>>>>
> > >>>>>>
> > >>>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <
> > >>>>>> jai.forums2013@gmail.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>  Hi Gwen,
> > >>>>>>>
> > >>>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a
> > >>>>>>> complete
> > >>>>>>> replacement.
> > >>>>>>>
> > >>>>>>> As for contributing to Zookeeper, yes that indeed in on my mind,
> but
> > >>>>>>> I
> > >>>>>>> haven't yet had a chance to really look deeper into Zookeeper or
> get
> > >>>>>>> in
> > >>>>>>> touch with their dev team to try and explain this potential
> > >>>>>>> improvement
> > >>>>>>> to
> > >>>>>>> them. I have no objection to contributing this or something
> similar
> > >>>>>>> to
> > >>>>>>> Zookeeper directly. I think I should be able to bring this up in
> the
> > >>>>>>> Zookeeper dev forum, sometime soon in the next few weekends.
> > >>>>>>>
> > >>>>>>> -Jaikiran
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
> > >>>>>>>
> > >>>>>>>  It looks like the new KafkaZkClient is a wrapper around
> ZkClient,
> > >>>>>>>> but
> > >>>>>>>> not a replacement. Did I get it right?
> > >>>>>>>>
> > >>>>>>>> I think a wrapper for ZkClient can be useful - for example
> > >>>>>>>> KAFKA-1664
> > >>>>>>>> can also use one.
> > >>>>>>>>
> > >>>>>>>> However, I'm wondering why not contribute the fix directly to
> > >>>>>>>> ZKClient
> > >>>>>>>> project and ask for a release that contains the fix?
> > >>>>>>>> This will benefit other users of the project who may also need a
> > >>>>>>>> timeout (thats pretty basic...)
> > >>>>>>>>
> > >>>>>>>> As an alternative, if we don't want to collaborate with
> ZKClient for
> > >>>>>>>> some reason, forking the project into Kafka will probably give
> us
> > >>>>>>>> more
> > >>>>>>>> control than wrappers and without much downside.
> > >>>>>>>>
> > >>>>>>>> Just a thought.
> > >>>>>>>>
> > >>>>>>>> Gwen
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
> > >>>>>>>> <ja...@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>  Neha, Ewen (and others), my initial attempt to solve this is
> > >>>>>>>>> uploaded
> > >>>>>>>>> here
> > >>>>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown
> > >>>>>>>>> problem
> > >>>>>>>>> and
> > >>>>>>>>> now
> > >>>>>>>>> the server shuts down even when Zookeeper has gone down before
> the
> > >>>>>>>>> Kafka
> > >>>>>>>>> server.
> > >>>>>>>>>
> > >>>>>>>>> I went with the approach of introducing a custom (enhanced)
> > >>>>>>>>> ZkClient
> > >>>>>>>>> which
> > >>>>>>>>> for now allows time outs to be optionally specified for certain
> > >>>>>>>>> operations.
> > >>>>>>>>> I intentionally haven't forced the use of this new
> KafkaZkClient
> > >>>>>>>>> all
> > >>>>>>>>> over
> > >>>>>>>>> the code and instead for now have just used it in the
> KafkaServer.
> > >>>>>>>>>
> > >>>>>>>>> Does this patch look like something worth using?
> > >>>>>>>>>
> > >>>>>>>>> -Jaikiran
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
> > >>>>>>>>>
> > >>>>>>>>>  Ewen is right. ZkClient APIs are blocking and the right fix
> for
> > >>>>>>>>>> this
> > >>>>>>>>>> seems
> > >>>>>>>>>> to be patching ZkClient. At some point, if we find ourselves
> > >>>>>>>>>> fiddling
> > >>>>>>>>>> too
> > >>>>>>>>>> much with ZkClient, it wouldn't hurt to write our own little
> > >>>>>>>>>> zookeeper
> > >>>>>>>>>> client wrapper.
> > >>>>>>>>>>
> > >>>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
> > >>>>>>>>>> <ew...@confluent.io>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>    Looks like a bug to me -- the underlying ZK library wraps a
> > >>>>>>>>>> lot of
> > >>>>>>>>>>
> > >>>>>>>>>>> blocking
> > >>>>>>>>>>> method implementations with waitUntilConnected() calls
> without
> > >>>>>>>>>>> any
> > >>>>>>>>>>> timeouts. Ideally we could just add a version of
> > >>>>>>>>>>> ZkUtils.getController()
> > >>>>>>>>>>> with a timeout, but I don't see an easy way to accomplish
> that
> > >>>>>>>>>>> with
> > >>>>>>>>>>> ZkClient.
> > >>>>>>>>>>>
> > >>>>>>>>>>> There's at least one other call to ZkUtils besides the one
> in the
> > >>>>>>>>>>> stacktrace you gave that would cause the same issue, possibly
> > >>>>>>>>>>> more
> > >>>>>>>>>>> that
> > >>>>>>>>>>> aren't directly called in that method. One ugly solution
> would be
> > >>>>>>>>>>> to
> > >>>>>>>>>>> use
> > >>>>>>>>>>> an
> > >>>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd
> imagine
> > >>>>>>>>>>> we
> > >>>>>>>>>>> probably have other threads that could end up blocking in
> similar
> > >>>>>>>>>>> ways.
> > >>>>>>>>>>>
> > >>>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to
> > >>>>>>>>>>> track
> > >>>>>>>>>>> the
> > >>>>>>>>>>> issue.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
> > >>>>>>>>>>> jai.forums2013@gmail.com>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>    The main culprit is this thread which goes into "forever
> retry
> > >>>>>>>>>>>
> > >>>>>>>>>>>> connection
> > >>>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl +
> C)
> > >>>>>>>>>>>> after
> > >>>>>>>>>>>> zookeeper has already been shutdown. I have attached the
> > >>>>>>>>>>>> complete
> > >>>>>>>>>>>> thread
> > >>>>>>>>>>>> dump, but I don't know if it will be delivered to the
> mailing
> > >>>>>>>>>>>> list.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on
> > >>>>>>>>>>>> condition
> > >>>>>>>>>>>> [0x6ad69000]
> > >>>>>>>>>>>>        java.lang.Thread.State: TIMED_WAITING (parking)
> > >>>>>>>>>>>>         at sun.misc.Unsafe.park(Native Method)
> > >>>>>>>>>>>>         - parking to wait for  <0x70a93368> (a
> > >>>>>>>>>>>> java.util.concurrent.locks.
> > >>>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
> > >>>>>>>>>>>>         at java.util.concurrent.locks.LockSupport.parkUntil(
> > >>>>>>>>>>>> LockSupport.java:267)
> > >>>>>>>>>>>>         at java.util.concurrent.locks.
> > >>>>>>>>>>>> AbstractQueuedSynchronizer$
> > >>>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.
> > >>>>>>>>>>>> java:2130)
> > >>>>>>>>>>>>         at
> > >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.
> > >>>>>>>>>>>> java:636)
> > >>>>>>>>>>>>         at
> > >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
> > >>>>>>>>>>>> java:619)
> > >>>>>>>>>>>>         at
> > >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
> > >>>>>>>>>>>> java:615)
> > >>>>>>>>>>>>         at
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>  org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
> > >>>>>>>>>>> java:679)
> > >>>>>>>>>>>
> > >>>>>>>>>>>          at org.I0Itec.zkclient.ZkClient.
> > >>>>>>>>>>>> readData(ZkClient.java:766)
> > >>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.
> > >>>>>>>>>>>> readData(ZkClient.java:761)
> > >>>>>>>>>>>>         at
> > >>>>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
> > >>>>>>>>>>>>         at
> kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
> > >>>>>>>>>>>>         at
> kafka.server.KafkaServer.kafka$server$KafkaServer$$
> > >>>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
> > >>>>>>>>>>>>         at kafka.server.KafkaServer$$
> > >>>>>>>>>>>> anonfun$shutdown$1.apply$mcV$
> > >>>>>>>>>>>> sp(KafkaServer.scala:269)
> > >>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:172)
> > >>>>>>>>>>>>         at kafka.utils.Logging$class.
> > >>>>>>>>>>>> swallowWarn(Logging.scala:92)
> > >>>>>>>>>>>>         at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
> > >>>>>>>>>>>>         at
> kafka.utils.Logging$class.swallow(Logging.scala:94)
> > >>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:45)
> > >>>>>>>>>>>>         at
> kafka.server.KafkaServer.shutdown(KafkaServer.scala:
> > >>>>>>>>>>>> 269)
> > >>>>>>>>>>>>         at kafka.server.KafkaServerStartable.shutdown(
> > >>>>>>>>>>>> KafkaServerStartable.scala:42)
> > >>>>>>>>>>>>         at kafka.Kafka$$anon$1.run(Kafka.scala:42)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> -Jaikiran
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>    For a clean shutdown, the broker tries to talk to the
> > >>>>>>>>>>>> controller
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>> also
> > >>>>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries
> to
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> reconnect
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>  to
> > >>>>>>>>>>>> zk. It will help to look at the thread dump.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks
> > >>>>>>>>>>>>> Neha
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
> > >>>>>>>>>>>>> jai.forums2013@gmail.com
> > >>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>      I was just playing around with the RC2 of 0.8.2 and
> > >>>>>>>>>>>>> noticed
> > >>>>>>>>>>>>> that
> > >>>>>>>>>>>>> if I
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>  shutdown zookeeper first I can't shutdown Kafka server at
> all
> > >>>>>>>>>>>>>> since
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>> goes
> > >>>>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper.
> I had
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>> kill
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and
> > >>>>>>>>>>>>>> there
> > >>>>>>>>>>>>>> too I
> > >>>>>>>>>>>>>> see
> > >>>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if
> I can
> > >>>>>>>>>>>>>> come
> > >>>>>>>>>>>>>> up
> > >>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>> a patch?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts
> at
> > >>>>>>>>>>>>>> trying
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the
> other
> > >>>>>>>>>>>>>> thread
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>  which
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is
> blocked
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> forever
> > >>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000
> for
> > >>>>>>>>>>>>>> server
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>  null,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> > >>>>>>>>>>>> reconnect
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > >>>>>>>>>>>>>>          at
> sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > >>>>>>>>>>>>>> Method)
> > >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> > >>>>>>>>>>>>>> doTransport(
> > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > >>>>>>>>>>>>>>          at
> org.apache.zookeeper.ClientCnxn$SendThread.run(
> > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > >>>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection
> to
> > >>>>>>>>>>>>>> server
> > >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
> authenticate
> > >>>>>>>>>>>>>> using
> > >>>>>>>>>>>>>> SASL
> > >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> > >>>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000
> for
> > >>>>>>>>>>>>>> server
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>  null,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> > >>>>>>>>>>>> reconnect
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > >>>>>>>>>>>>>>          at
> sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > >>>>>>>>>>>>>> Method)
> > >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> > >>>>>>>>>>>>>> doTransport(
> > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > >>>>>>>>>>>>>>          at
> org.apache.zookeeper.ClientCnxn$SendThread.run(
> > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > >>>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection
> to
> > >>>>>>>>>>>>>> server
> > >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
> authenticate
> > >>>>>>>>>>>>>> using
> > >>>>>>>>>>>>>> SASL
> > >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> > >>>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000
> for
> > >>>>>>>>>>>>>> server
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>  null,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> > >>>>>>>>>>>> reconnect
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > >>>>>>>>>>>>>>          at
> sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > >>>>>>>>>>>>>> Method)
> > >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> > >>>>>>>>>>>>>> doTransport(
> > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > >>>>>>>>>>>>>>          at
> org.apache.zookeeper.ClientCnxn$SendThread.run(
> > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > >>>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection
> to
> > >>>>>>>>>>>>>> server
> > >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to
> authenticate
> > >>>>>>>>>>>>>> using
> > >>>>>>>>>>>>>> SASL
> > >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> > >>>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000
> for
> > >>>>>>>>>>>>>> server
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>  null,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> > >>>>>>>>>>>> reconnect
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> > >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> > >>>>>>>>>>>>>>          at
> sun.nio.ch.SocketChannelImpl.checkConnect(Native
> > >>>>>>>>>>>>>> Method)
> > >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> > >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> > >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> > >>>>>>>>>>>>>> doTransport(
> > >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> > >>>>>>>>>>>>>>          at
> org.apache.zookeeper.ClientCnxn$SendThread.run(
> > >>>>>>>>>>>>>> ClientCnxn.java:1081)
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> -Jaikiran
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>    --
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>> Ewen
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>  --
> > >>>>>> -- Guozhang
> > >>>>>>
> > >>>>>
> > >>>>
> > >
> >
> >
> > --
> > -- Guozhang
>

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Harsha <ka...@harsha.io>.
Any reason not to go with apache curator http://curator.apache.org/ .
-Harsha
On Tue, Feb 3, 2015, at 09:55 PM, Guozhang Wang wrote:
> I am also +1 on Neha's suggestion that "At some point, if we find
> ourselves
> fiddling too much with ZkClient, it wouldn't hurt to write our own little
> zookeeper client wrapper." since we have accumulated a bunch of issues
> with
> zkClient which takes long time be resolved if ever, so we ended up have
> some hacky way handling zkClient errors.
> 
> Guozhang
> 
> On Tue, Feb 3, 2015 at 7:47 PM, Jaikiran Pai <ja...@gmail.com>
> wrote:
> 
> > Yes, that's the plan :)
> >
> > -Jaikiran
> >
> > On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:
> >
> >> So I think the current plan is:
> >> 1. Add timeout in zkclient
> >> 2. Ask zkclient to release new version (we need it for few other things
> >> too)
> >> 3. Rebase on new zkclient
> >> 4. Fix this jira and the few others than were waiting for the new zkclient
> >>
> >> Does that make sense?
> >>
> >> Gwen
> >>
> >> On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <ja...@gmail.com>
> >> wrote:
> >>
> >>> I just heard back from Stefan, who manages the ZkClient repo and he
> >>> seems to
> >>> be open to have these changes be part of ZkClient project. I'll be
> >>> creating
> >>> a pull request for that project to have it reviewed and merged. Although
> >>> I
> >>> haven't heard of exact release plans, Stefan's reply did indicate that
> >>> the
> >>> project could be released after this change is merged.
> >>>
> >>> -Jaikiran
> >>>
> >>> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
> >>>
> >>>> Thanks for pointing to that repo!
> >>>>
> >>>> I just had a look at it and it appears that the project isn't much
> >>>> active
> >>>> (going by the lack of activity). The latest contribution is from Gwen
> >>>> and
> >>>> that was around 3 months back. I haven't found release plans for that
> >>>> project or a place to ask about it (filing an issue doesn't seem right
> >>>> to
> >>>> ask this question). So I'll get in touch with the repo owner and see
> >>>> what
> >>>> his plans for the project are.
> >>>>
> >>>> -Jaikiran
> >>>>
> >>>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
> >>>>
> >>>>> I did!
> >>>>>
> >>>>> Thanks for clarifying :)
> >>>>>
> >>>>> The client that is part of Zookeeper itself actually does support
> >>>>> timeouts.
> >>>>>
> >>>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wa...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Jaikiran,
> >>>>>>
> >>>>>> I think Gwen was talking about contributing to ZkClient project:
> >>>>>>
> >>>>>> https://github.com/sgroschupf/zkclient
> >>>>>>
> >>>>>> Guozhang
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <
> >>>>>> jai.forums2013@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>  Hi Gwen,
> >>>>>>>
> >>>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a
> >>>>>>> complete
> >>>>>>> replacement.
> >>>>>>>
> >>>>>>> As for contributing to Zookeeper, yes that indeed in on my mind, but
> >>>>>>> I
> >>>>>>> haven't yet had a chance to really look deeper into Zookeeper or get
> >>>>>>> in
> >>>>>>> touch with their dev team to try and explain this potential
> >>>>>>> improvement
> >>>>>>> to
> >>>>>>> them. I have no objection to contributing this or something similar
> >>>>>>> to
> >>>>>>> Zookeeper directly. I think I should be able to bring this up in the
> >>>>>>> Zookeeper dev forum, sometime soon in the next few weekends.
> >>>>>>>
> >>>>>>> -Jaikiran
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
> >>>>>>>
> >>>>>>>  It looks like the new KafkaZkClient is a wrapper around ZkClient,
> >>>>>>>> but
> >>>>>>>> not a replacement. Did I get it right?
> >>>>>>>>
> >>>>>>>> I think a wrapper for ZkClient can be useful - for example
> >>>>>>>> KAFKA-1664
> >>>>>>>> can also use one.
> >>>>>>>>
> >>>>>>>> However, I'm wondering why not contribute the fix directly to
> >>>>>>>> ZKClient
> >>>>>>>> project and ask for a release that contains the fix?
> >>>>>>>> This will benefit other users of the project who may also need a
> >>>>>>>> timeout (thats pretty basic...)
> >>>>>>>>
> >>>>>>>> As an alternative, if we don't want to collaborate with ZKClient for
> >>>>>>>> some reason, forking the project into Kafka will probably give us
> >>>>>>>> more
> >>>>>>>> control than wrappers and without much downside.
> >>>>>>>>
> >>>>>>>> Just a thought.
> >>>>>>>>
> >>>>>>>> Gwen
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
> >>>>>>>> <ja...@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>  Neha, Ewen (and others), my initial attempt to solve this is
> >>>>>>>>> uploaded
> >>>>>>>>> here
> >>>>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown
> >>>>>>>>> problem
> >>>>>>>>> and
> >>>>>>>>> now
> >>>>>>>>> the server shuts down even when Zookeeper has gone down before the
> >>>>>>>>> Kafka
> >>>>>>>>> server.
> >>>>>>>>>
> >>>>>>>>> I went with the approach of introducing a custom (enhanced)
> >>>>>>>>> ZkClient
> >>>>>>>>> which
> >>>>>>>>> for now allows time outs to be optionally specified for certain
> >>>>>>>>> operations.
> >>>>>>>>> I intentionally haven't forced the use of this new KafkaZkClient
> >>>>>>>>> all
> >>>>>>>>> over
> >>>>>>>>> the code and instead for now have just used it in the KafkaServer.
> >>>>>>>>>
> >>>>>>>>> Does this patch look like something worth using?
> >>>>>>>>>
> >>>>>>>>> -Jaikiran
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
> >>>>>>>>>
> >>>>>>>>>  Ewen is right. ZkClient APIs are blocking and the right fix for
> >>>>>>>>>> this
> >>>>>>>>>> seems
> >>>>>>>>>> to be patching ZkClient. At some point, if we find ourselves
> >>>>>>>>>> fiddling
> >>>>>>>>>> too
> >>>>>>>>>> much with ZkClient, it wouldn't hurt to write our own little
> >>>>>>>>>> zookeeper
> >>>>>>>>>> client wrapper.
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
> >>>>>>>>>> <ew...@confluent.io>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>    Looks like a bug to me -- the underlying ZK library wraps a
> >>>>>>>>>> lot of
> >>>>>>>>>>
> >>>>>>>>>>> blocking
> >>>>>>>>>>> method implementations with waitUntilConnected() calls without
> >>>>>>>>>>> any
> >>>>>>>>>>> timeouts. Ideally we could just add a version of
> >>>>>>>>>>> ZkUtils.getController()
> >>>>>>>>>>> with a timeout, but I don't see an easy way to accomplish that
> >>>>>>>>>>> with
> >>>>>>>>>>> ZkClient.
> >>>>>>>>>>>
> >>>>>>>>>>> There's at least one other call to ZkUtils besides the one in the
> >>>>>>>>>>> stacktrace you gave that would cause the same issue, possibly
> >>>>>>>>>>> more
> >>>>>>>>>>> that
> >>>>>>>>>>> aren't directly called in that method. One ugly solution would be
> >>>>>>>>>>> to
> >>>>>>>>>>> use
> >>>>>>>>>>> an
> >>>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine
> >>>>>>>>>>> we
> >>>>>>>>>>> probably have other threads that could end up blocking in similar
> >>>>>>>>>>> ways.
> >>>>>>>>>>>
> >>>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to
> >>>>>>>>>>> track
> >>>>>>>>>>> the
> >>>>>>>>>>> issue.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
> >>>>>>>>>>> jai.forums2013@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>    The main culprit is this thread which goes into "forever retry
> >>>>>>>>>>>
> >>>>>>>>>>>> connection
> >>>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C)
> >>>>>>>>>>>> after
> >>>>>>>>>>>> zookeeper has already been shutdown. I have attached the
> >>>>>>>>>>>> complete
> >>>>>>>>>>>> thread
> >>>>>>>>>>>> dump, but I don't know if it will be delivered to the mailing
> >>>>>>>>>>>> list.
> >>>>>>>>>>>>
> >>>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on
> >>>>>>>>>>>> condition
> >>>>>>>>>>>> [0x6ad69000]
> >>>>>>>>>>>>        java.lang.Thread.State: TIMED_WAITING (parking)
> >>>>>>>>>>>>         at sun.misc.Unsafe.park(Native Method)
> >>>>>>>>>>>>         - parking to wait for  <0x70a93368> (a
> >>>>>>>>>>>> java.util.concurrent.locks.
> >>>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
> >>>>>>>>>>>>         at java.util.concurrent.locks.LockSupport.parkUntil(
> >>>>>>>>>>>> LockSupport.java:267)
> >>>>>>>>>>>>         at java.util.concurrent.locks.
> >>>>>>>>>>>> AbstractQueuedSynchronizer$
> >>>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.
> >>>>>>>>>>>> java:2130)
> >>>>>>>>>>>>         at
> >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.
> >>>>>>>>>>>> java:636)
> >>>>>>>>>>>>         at
> >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
> >>>>>>>>>>>> java:619)
> >>>>>>>>>>>>         at
> >>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
> >>>>>>>>>>>> java:615)
> >>>>>>>>>>>>         at
> >>>>>>>>>>>>
> >>>>>>>>>>>>  org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
> >>>>>>>>>>> java:679)
> >>>>>>>>>>>
> >>>>>>>>>>>          at org.I0Itec.zkclient.ZkClient.
> >>>>>>>>>>>> readData(ZkClient.java:766)
> >>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.
> >>>>>>>>>>>> readData(ZkClient.java:761)
> >>>>>>>>>>>>         at
> >>>>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
> >>>>>>>>>>>>         at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
> >>>>>>>>>>>>         at kafka.server.KafkaServer.kafka$server$KafkaServer$$
> >>>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
> >>>>>>>>>>>>         at kafka.server.KafkaServer$$
> >>>>>>>>>>>> anonfun$shutdown$1.apply$mcV$
> >>>>>>>>>>>> sp(KafkaServer.scala:269)
> >>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:172)
> >>>>>>>>>>>>         at kafka.utils.Logging$class.
> >>>>>>>>>>>> swallowWarn(Logging.scala:92)
> >>>>>>>>>>>>         at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
> >>>>>>>>>>>>         at kafka.utils.Logging$class.swallow(Logging.scala:94)
> >>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:45)
> >>>>>>>>>>>>         at kafka.server.KafkaServer.shutdown(KafkaServer.scala:
> >>>>>>>>>>>> 269)
> >>>>>>>>>>>>         at kafka.server.KafkaServerStartable.shutdown(
> >>>>>>>>>>>> KafkaServerStartable.scala:42)
> >>>>>>>>>>>>         at kafka.Kafka$$anon$1.run(Kafka.scala:42)
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Jaikiran
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>    For a clean shutdown, the broker tries to talk to the
> >>>>>>>>>>>> controller
> >>>>>>>>>>>> and
> >>>>>>>>>>>> also
> >>>>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
> >>>>>>>>>>>>
> >>>>>>>>>>>>> reconnect
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  to
> >>>>>>>>>>>> zk. It will help to look at the thread dump.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>> Neha
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
> >>>>>>>>>>>>> jai.forums2013@gmail.com
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>      I was just playing around with the RC2 of 0.8.2 and
> >>>>>>>>>>>>> noticed
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>> if I
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  shutdown zookeeper first I can't shutdown Kafka server at all
> >>>>>>>>>>>>>> since
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>> goes
> >>>>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>> kill
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and
> >>>>>>>>>>>>>> there
> >>>>>>>>>>>>>> too I
> >>>>>>>>>>>>>> see
> >>>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if I can
> >>>>>>>>>>>>>> come
> >>>>>>>>>>>>>> up
> >>>>>>>>>>>>>> with
> >>>>>>>>>>>>>> a patch?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at
> >>>>>>>>>>>>>> trying
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the other
> >>>>>>>>>>>>>> thread
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  which
> >>>>>>>>>>>>>
> >>>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
> >>>>>>>>>>>>
> >>>>>>>>>>>>> forever
> >>>>>>>>>>>>>> for
> >>>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  null,
> >>>>>>>>>>>>>
> >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>>> reconnect
> >>>>>>>>>>>>
> >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> >>>>>>>>>>>>>> doTransport(
> >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
> >>>>>>>>>>>>>> using
> >>>>>>>>>>>>>> SASL
> >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  null,
> >>>>>>>>>>>>>
> >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>>> reconnect
> >>>>>>>>>>>>
> >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> >>>>>>>>>>>>>> doTransport(
> >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
> >>>>>>>>>>>>>> using
> >>>>>>>>>>>>>> SASL
> >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  null,
> >>>>>>>>>>>>>
> >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>>> reconnect
> >>>>>>>>>>>>
> >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> >>>>>>>>>>>>>> doTransport(
> >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
> >>>>>>>>>>>>>> using
> >>>>>>>>>>>>>> SASL
> >>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  null,
> >>>>>>>>>>>>>
> >>>>>>>>>>>> unexpected error, closing socket connection and attempting
> >>>>>>>>>>>> reconnect
> >>>>>>>>>>>>
> >>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
> >>>>>>>>>>>>>> java.net.ConnectException: Connection refused
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
> >>>>>>>>>>>>>> Method)
> >>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>>>>>>>>>>>>> SocketChannelImpl.java:739)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
> >>>>>>>>>>>>>> doTransport(
> >>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
> >>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>>>>>>>>>>>>> ClientCnxn.java:1081)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Jaikiran
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>    --
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>> Ewen
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  --
> >>>>>> -- Guozhang
> >>>>>>
> >>>>>
> >>>>
> >
> 
> 
> -- 
> -- Guozhang

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Guozhang Wang <wa...@gmail.com>.
I am also +1 on Neha's suggestion that "At some point, if we find ourselves
fiddling too much with ZkClient, it wouldn't hurt to write our own little
zookeeper client wrapper." since we have accumulated a bunch of issues with
zkClient which takes long time be resolved if ever, so we ended up have
some hacky way handling zkClient errors.

Guozhang

On Tue, Feb 3, 2015 at 7:47 PM, Jaikiran Pai <ja...@gmail.com>
wrote:

> Yes, that's the plan :)
>
> -Jaikiran
>
> On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:
>
>> So I think the current plan is:
>> 1. Add timeout in zkclient
>> 2. Ask zkclient to release new version (we need it for few other things
>> too)
>> 3. Rebase on new zkclient
>> 4. Fix this jira and the few others than were waiting for the new zkclient
>>
>> Does that make sense?
>>
>> Gwen
>>
>> On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <ja...@gmail.com>
>> wrote:
>>
>>> I just heard back from Stefan, who manages the ZkClient repo and he
>>> seems to
>>> be open to have these changes be part of ZkClient project. I'll be
>>> creating
>>> a pull request for that project to have it reviewed and merged. Although
>>> I
>>> haven't heard of exact release plans, Stefan's reply did indicate that
>>> the
>>> project could be released after this change is merged.
>>>
>>> -Jaikiran
>>>
>>> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
>>>
>>>> Thanks for pointing to that repo!
>>>>
>>>> I just had a look at it and it appears that the project isn't much
>>>> active
>>>> (going by the lack of activity). The latest contribution is from Gwen
>>>> and
>>>> that was around 3 months back. I haven't found release plans for that
>>>> project or a place to ask about it (filing an issue doesn't seem right
>>>> to
>>>> ask this question). So I'll get in touch with the repo owner and see
>>>> what
>>>> his plans for the project are.
>>>>
>>>> -Jaikiran
>>>>
>>>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
>>>>
>>>>> I did!
>>>>>
>>>>> Thanks for clarifying :)
>>>>>
>>>>> The client that is part of Zookeeper itself actually does support
>>>>> timeouts.
>>>>>
>>>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Jaikiran,
>>>>>>
>>>>>> I think Gwen was talking about contributing to ZkClient project:
>>>>>>
>>>>>> https://github.com/sgroschupf/zkclient
>>>>>>
>>>>>> Guozhang
>>>>>>
>>>>>>
>>>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <
>>>>>> jai.forums2013@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>  Hi Gwen,
>>>>>>>
>>>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a
>>>>>>> complete
>>>>>>> replacement.
>>>>>>>
>>>>>>> As for contributing to Zookeeper, yes that indeed in on my mind, but
>>>>>>> I
>>>>>>> haven't yet had a chance to really look deeper into Zookeeper or get
>>>>>>> in
>>>>>>> touch with their dev team to try and explain this potential
>>>>>>> improvement
>>>>>>> to
>>>>>>> them. I have no objection to contributing this or something similar
>>>>>>> to
>>>>>>> Zookeeper directly. I think I should be able to bring this up in the
>>>>>>> Zookeeper dev forum, sometime soon in the next few weekends.
>>>>>>>
>>>>>>> -Jaikiran
>>>>>>>
>>>>>>>
>>>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>>>>>>>
>>>>>>>  It looks like the new KafkaZkClient is a wrapper around ZkClient,
>>>>>>>> but
>>>>>>>> not a replacement. Did I get it right?
>>>>>>>>
>>>>>>>> I think a wrapper for ZkClient can be useful - for example
>>>>>>>> KAFKA-1664
>>>>>>>> can also use one.
>>>>>>>>
>>>>>>>> However, I'm wondering why not contribute the fix directly to
>>>>>>>> ZKClient
>>>>>>>> project and ask for a release that contains the fix?
>>>>>>>> This will benefit other users of the project who may also need a
>>>>>>>> timeout (thats pretty basic...)
>>>>>>>>
>>>>>>>> As an alternative, if we don't want to collaborate with ZKClient for
>>>>>>>> some reason, forking the project into Kafka will probably give us
>>>>>>>> more
>>>>>>>> control than wrappers and without much downside.
>>>>>>>>
>>>>>>>> Just a thought.
>>>>>>>>
>>>>>>>> Gwen
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
>>>>>>>> <ja...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>  Neha, Ewen (and others), my initial attempt to solve this is
>>>>>>>>> uploaded
>>>>>>>>> here
>>>>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown
>>>>>>>>> problem
>>>>>>>>> and
>>>>>>>>> now
>>>>>>>>> the server shuts down even when Zookeeper has gone down before the
>>>>>>>>> Kafka
>>>>>>>>> server.
>>>>>>>>>
>>>>>>>>> I went with the approach of introducing a custom (enhanced)
>>>>>>>>> ZkClient
>>>>>>>>> which
>>>>>>>>> for now allows time outs to be optionally specified for certain
>>>>>>>>> operations.
>>>>>>>>> I intentionally haven't forced the use of this new KafkaZkClient
>>>>>>>>> all
>>>>>>>>> over
>>>>>>>>> the code and instead for now have just used it in the KafkaServer.
>>>>>>>>>
>>>>>>>>> Does this patch look like something worth using?
>>>>>>>>>
>>>>>>>>> -Jaikiran
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>>>>>>>>
>>>>>>>>>  Ewen is right. ZkClient APIs are blocking and the right fix for
>>>>>>>>>> this
>>>>>>>>>> seems
>>>>>>>>>> to be patching ZkClient. At some point, if we find ourselves
>>>>>>>>>> fiddling
>>>>>>>>>> too
>>>>>>>>>> much with ZkClient, it wouldn't hurt to write our own little
>>>>>>>>>> zookeeper
>>>>>>>>>> client wrapper.
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>>>>>>>> <ew...@confluent.io>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>    Looks like a bug to me -- the underlying ZK library wraps a
>>>>>>>>>> lot of
>>>>>>>>>>
>>>>>>>>>>> blocking
>>>>>>>>>>> method implementations with waitUntilConnected() calls without
>>>>>>>>>>> any
>>>>>>>>>>> timeouts. Ideally we could just add a version of
>>>>>>>>>>> ZkUtils.getController()
>>>>>>>>>>> with a timeout, but I don't see an easy way to accomplish that
>>>>>>>>>>> with
>>>>>>>>>>> ZkClient.
>>>>>>>>>>>
>>>>>>>>>>> There's at least one other call to ZkUtils besides the one in the
>>>>>>>>>>> stacktrace you gave that would cause the same issue, possibly
>>>>>>>>>>> more
>>>>>>>>>>> that
>>>>>>>>>>> aren't directly called in that method. One ugly solution would be
>>>>>>>>>>> to
>>>>>>>>>>> use
>>>>>>>>>>> an
>>>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine
>>>>>>>>>>> we
>>>>>>>>>>> probably have other threads that could end up blocking in similar
>>>>>>>>>>> ways.
>>>>>>>>>>>
>>>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to
>>>>>>>>>>> track
>>>>>>>>>>> the
>>>>>>>>>>> issue.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>>>>>>>> jai.forums2013@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>    The main culprit is this thread which goes into "forever retry
>>>>>>>>>>>
>>>>>>>>>>>> connection
>>>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C)
>>>>>>>>>>>> after
>>>>>>>>>>>> zookeeper has already been shutdown. I have attached the
>>>>>>>>>>>> complete
>>>>>>>>>>>> thread
>>>>>>>>>>>> dump, but I don't know if it will be delivered to the mailing
>>>>>>>>>>>> list.
>>>>>>>>>>>>
>>>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on
>>>>>>>>>>>> condition
>>>>>>>>>>>> [0x6ad69000]
>>>>>>>>>>>>        java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>>>>>>>         at sun.misc.Unsafe.park(Native Method)
>>>>>>>>>>>>         - parking to wait for  <0x70a93368> (a
>>>>>>>>>>>> java.util.concurrent.locks.
>>>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>>>>>>>         at java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>>>>>>>> LockSupport.java:267)
>>>>>>>>>>>>         at java.util.concurrent.locks.
>>>>>>>>>>>> AbstractQueuedSynchronizer$
>>>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.
>>>>>>>>>>>> java:2130)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.
>>>>>>>>>>>> java:636)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
>>>>>>>>>>>> java:619)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.
>>>>>>>>>>>> java:615)
>>>>>>>>>>>>         at
>>>>>>>>>>>>
>>>>>>>>>>>>  org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.
>>>>>>>>>>> java:679)
>>>>>>>>>>>
>>>>>>>>>>>          at org.I0Itec.zkclient.ZkClient.
>>>>>>>>>>>> readData(ZkClient.java:766)
>>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.
>>>>>>>>>>>> readData(ZkClient.java:761)
>>>>>>>>>>>>         at
>>>>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>>>>>>>>         at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>>>>>>>         at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>>>>>>>         at kafka.server.KafkaServer$$
>>>>>>>>>>>> anonfun$shutdown$1.apply$mcV$
>>>>>>>>>>>> sp(KafkaServer.scala:269)
>>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>>>>>>>         at kafka.utils.Logging$class.
>>>>>>>>>>>> swallowWarn(Logging.scala:92)
>>>>>>>>>>>>         at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>>>>>>>>         at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>>>>>>>         at kafka.server.KafkaServer.shutdown(KafkaServer.scala:
>>>>>>>>>>>> 269)
>>>>>>>>>>>>         at kafka.server.KafkaServerStartable.shutdown(
>>>>>>>>>>>> KafkaServerStartable.scala:42)
>>>>>>>>>>>>         at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>>>>>>>
>>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>    For a clean shutdown, the broker tries to talk to the
>>>>>>>>>>>> controller
>>>>>>>>>>>> and
>>>>>>>>>>>> also
>>>>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
>>>>>>>>>>>>
>>>>>>>>>>>>> reconnect
>>>>>>>>>>>>>
>>>>>>>>>>>>>  to
>>>>>>>>>>>> zk. It will help to look at the thread dump.
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Neha
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
>>>>>>>>>>>>> jai.forums2013@gmail.com
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>      I was just playing around with the RC2 of 0.8.2 and
>>>>>>>>>>>>> noticed
>>>>>>>>>>>>> that
>>>>>>>>>>>>> if I
>>>>>>>>>>>>>
>>>>>>>>>>>>>  shutdown zookeeper first I can't shutdown Kafka server at all
>>>>>>>>>>>>>> since
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>> goes
>>>>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> kill
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and
>>>>>>>>>>>>>> there
>>>>>>>>>>>>>> too I
>>>>>>>>>>>>>> see
>>>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if I can
>>>>>>>>>>>>>> come
>>>>>>>>>>>>>> up
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>> a patch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at
>>>>>>>>>>>>>> trying
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the other
>>>>>>>>>>>>>> thread
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  which
>>>>>>>>>>>>>
>>>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>>>>>>>>
>>>>>>>>>>>>> forever
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  null,
>>>>>>>>>>>>>
>>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to
>>>>>>>>>>>>>> server
>>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
>>>>>>>>>>>>>> using
>>>>>>>>>>>>>> SASL
>>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  null,
>>>>>>>>>>>>>
>>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to
>>>>>>>>>>>>>> server
>>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
>>>>>>>>>>>>>> using
>>>>>>>>>>>>>> SASL
>>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  null,
>>>>>>>>>>>>>
>>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to
>>>>>>>>>>>>>> server
>>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate
>>>>>>>>>>>>>> using
>>>>>>>>>>>>>> SASL
>>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  null,
>>>>>>>>>>>>>
>>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.
>>>>>>>>>>>>>> doTransport(
>>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    --
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>> Ewen
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  --
>>>>>> -- Guozhang
>>>>>>
>>>>>
>>>>
>


-- 
-- Guozhang

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Jaikiran Pai <ja...@gmail.com>.
Yes, that's the plan :)

-Jaikiran
On Wednesday 04 February 2015 12:33 AM, Gwen Shapira wrote:
> So I think the current plan is:
> 1. Add timeout in zkclient
> 2. Ask zkclient to release new version (we need it for few other things too)
> 3. Rebase on new zkclient
> 4. Fix this jira and the few others than were waiting for the new zkclient
>
> Does that make sense?
>
> Gwen
>
> On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <ja...@gmail.com> wrote:
>> I just heard back from Stefan, who manages the ZkClient repo and he seems to
>> be open to have these changes be part of ZkClient project. I'll be creating
>> a pull request for that project to have it reviewed and merged. Although I
>> haven't heard of exact release plans, Stefan's reply did indicate that the
>> project could be released after this change is merged.
>>
>> -Jaikiran
>>
>> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
>>> Thanks for pointing to that repo!
>>>
>>> I just had a look at it and it appears that the project isn't much active
>>> (going by the lack of activity). The latest contribution is from Gwen and
>>> that was around 3 months back. I haven't found release plans for that
>>> project or a place to ask about it (filing an issue doesn't seem right to
>>> ask this question). So I'll get in touch with the repo owner and see what
>>> his plans for the project are.
>>>
>>> -Jaikiran
>>>
>>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
>>>> I did!
>>>>
>>>> Thanks for clarifying :)
>>>>
>>>> The client that is part of Zookeeper itself actually does support
>>>> timeouts.
>>>>
>>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wa...@gmail.com> wrote:
>>>>> Hi Jaikiran,
>>>>>
>>>>> I think Gwen was talking about contributing to ZkClient project:
>>>>>
>>>>> https://github.com/sgroschupf/zkclient
>>>>>
>>>>> Guozhang
>>>>>
>>>>>
>>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <ja...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Gwen,
>>>>>>
>>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
>>>>>> replacement.
>>>>>>
>>>>>> As for contributing to Zookeeper, yes that indeed in on my mind, but I
>>>>>> haven't yet had a chance to really look deeper into Zookeeper or get in
>>>>>> touch with their dev team to try and explain this potential improvement
>>>>>> to
>>>>>> them. I have no objection to contributing this or something similar to
>>>>>> Zookeeper directly. I think I should be able to bring this up in the
>>>>>> Zookeeper dev forum, sometime soon in the next few weekends.
>>>>>>
>>>>>> -Jaikiran
>>>>>>
>>>>>>
>>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>>>>>>
>>>>>>> It looks like the new KafkaZkClient is a wrapper around ZkClient, but
>>>>>>> not a replacement. Did I get it right?
>>>>>>>
>>>>>>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664
>>>>>>> can also use one.
>>>>>>>
>>>>>>> However, I'm wondering why not contribute the fix directly to ZKClient
>>>>>>> project and ask for a release that contains the fix?
>>>>>>> This will benefit other users of the project who may also need a
>>>>>>> timeout (thats pretty basic...)
>>>>>>>
>>>>>>> As an alternative, if we don't want to collaborate with ZKClient for
>>>>>>> some reason, forking the project into Kafka will probably give us more
>>>>>>> control than wrappers and without much downside.
>>>>>>>
>>>>>>> Just a thought.
>>>>>>>
>>>>>>> Gwen
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
>>>>>>> <ja...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Neha, Ewen (and others), my initial attempt to solve this is uploaded
>>>>>>>> here
>>>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown problem
>>>>>>>> and
>>>>>>>> now
>>>>>>>> the server shuts down even when Zookeeper has gone down before the
>>>>>>>> Kafka
>>>>>>>> server.
>>>>>>>>
>>>>>>>> I went with the approach of introducing a custom (enhanced) ZkClient
>>>>>>>> which
>>>>>>>> for now allows time outs to be optionally specified for certain
>>>>>>>> operations.
>>>>>>>> I intentionally haven't forced the use of this new KafkaZkClient all
>>>>>>>> over
>>>>>>>> the code and instead for now have just used it in the KafkaServer.
>>>>>>>>
>>>>>>>> Does this patch look like something worth using?
>>>>>>>>
>>>>>>>> -Jaikiran
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>>>>>>>
>>>>>>>>> Ewen is right. ZkClient APIs are blocking and the right fix for this
>>>>>>>>> seems
>>>>>>>>> to be patching ZkClient. At some point, if we find ourselves
>>>>>>>>> fiddling
>>>>>>>>> too
>>>>>>>>> much with ZkClient, it wouldn't hurt to write our own little
>>>>>>>>> zookeeper
>>>>>>>>> client wrapper.
>>>>>>>>>
>>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>>>>>>> <ew...@confluent.io>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>    Looks like a bug to me -- the underlying ZK library wraps a lot of
>>>>>>>>>> blocking
>>>>>>>>>> method implementations with waitUntilConnected() calls without any
>>>>>>>>>> timeouts. Ideally we could just add a version of
>>>>>>>>>> ZkUtils.getController()
>>>>>>>>>> with a timeout, but I don't see an easy way to accomplish that with
>>>>>>>>>> ZkClient.
>>>>>>>>>>
>>>>>>>>>> There's at least one other call to ZkUtils besides the one in the
>>>>>>>>>> stacktrace you gave that would cause the same issue, possibly more
>>>>>>>>>> that
>>>>>>>>>> aren't directly called in that method. One ugly solution would be
>>>>>>>>>> to
>>>>>>>>>> use
>>>>>>>>>> an
>>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine
>>>>>>>>>> we
>>>>>>>>>> probably have other threads that could end up blocking in similar
>>>>>>>>>> ways.
>>>>>>>>>>
>>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track
>>>>>>>>>> the
>>>>>>>>>> issue.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>>>>>>> jai.forums2013@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>    The main culprit is this thread which goes into "forever retry
>>>>>>>>>>> connection
>>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C)
>>>>>>>>>>> after
>>>>>>>>>>> zookeeper has already been shutdown. I have attached the complete
>>>>>>>>>>> thread
>>>>>>>>>>> dump, but I don't know if it will be delivered to the mailing
>>>>>>>>>>> list.
>>>>>>>>>>>
>>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>>>>>>>>> [0x6ad69000]
>>>>>>>>>>>        java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>>>>>>         at sun.misc.Unsafe.park(Native Method)
>>>>>>>>>>>         - parking to wait for  <0x70a93368> (a
>>>>>>>>>>> java.util.concurrent.locks.
>>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>>>>>>         at java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>>>>>>> LockSupport.java:267)
>>>>>>>>>>>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>>>>>>>>         at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>>>>>>>>>>>         at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>>>>>>>>>>>         at
>>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>>>>>>>>>>>         at
>>>>>>>>>>>
>>>>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>>>>>>>>>>
>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>>>>>>>>>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>>>>>>>>>         at
>>>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>>>>>>>         at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>>>>>>         at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>>>>>>         at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
>>>>>>>>>>> sp(KafkaServer.scala:269)
>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>>>>>>         at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>>>>>>>>>>>         at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>>>>>>>         at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>>>>>>         at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>>>>>>         at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>>>>>>>>>>>         at kafka.server.KafkaServerStartable.shutdown(
>>>>>>>>>>> KafkaServerStartable.scala:42)
>>>>>>>>>>>         at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>>>>>>
>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>>>>>>>>
>>>>>>>>>>>    For a clean shutdown, the broker tries to talk to the controller
>>>>>>>>>>> and
>>>>>>>>>>> also
>>>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
>>>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>> zk. It will help to look at the thread dump.
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Neha
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
>>>>>>>>>>>> jai.forums2013@gmail.com
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>      I was just playing around with the RC2 of 0.8.2 and noticed
>>>>>>>>>>>> that
>>>>>>>>>>>> if I
>>>>>>>>>>>>
>>>>>>>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all
>>>>>>>>>>>>> since
>>>>>>>>>>>>> it
>>>>>>>>>>>>> goes
>>>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had
>>>>>>>>>>>>> to
>>>>>>>>>>>>> kill
>>>>>>>>>>>>> the
>>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and there
>>>>>>>>>>>>> too I
>>>>>>>>>>>>> see
>>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if I can
>>>>>>>>>>>>> come
>>>>>>>>>>>>> up
>>>>>>>>>>>>> with
>>>>>>>>>>>>> a patch?
>>>>>>>>>>>>>
>>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at
>>>>>>>>>>>>> trying
>>>>>>>>>>>>> to
>>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the other
>>>>>>>>>>>>> thread
>>>>>>>>>>>>>
>>>>>>>>>>>> which
>>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>>>>>>>>> forever
>>>>>>>>>>>>> for
>>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>> null,
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to
>>>>>>>>>>>>> server
>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>>>>> SASL
>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>> null,
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to
>>>>>>>>>>>>> server
>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>>>>> SASL
>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>> null,
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to
>>>>>>>>>>>>> server
>>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>>>>> SASL
>>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>>> server
>>>>>>>>>>>>>
>>>>>>>>>>>> null,
>>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>>> reconnect
>>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>          at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>>          at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    --
>>>>>>>>>> Thanks,
>>>>>>>>>> Ewen
>>>>>>>>>>
>>>>>>>>>>
>>>>> --
>>>>> -- Guozhang
>>>


Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Gwen Shapira <gs...@cloudera.com>.
So I think the current plan is:
1. Add timeout in zkclient
2. Ask zkclient to release new version (we need it for few other things too)
3. Rebase on new zkclient
4. Fix this jira and the few others than were waiting for the new zkclient

Does that make sense?

Gwen

On Mon, Feb 2, 2015 at 8:33 PM, Jaikiran Pai <ja...@gmail.com> wrote:
> I just heard back from Stefan, who manages the ZkClient repo and he seems to
> be open to have these changes be part of ZkClient project. I'll be creating
> a pull request for that project to have it reviewed and merged. Although I
> haven't heard of exact release plans, Stefan's reply did indicate that the
> project could be released after this change is merged.
>
> -Jaikiran
>
> On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
>>
>> Thanks for pointing to that repo!
>>
>> I just had a look at it and it appears that the project isn't much active
>> (going by the lack of activity). The latest contribution is from Gwen and
>> that was around 3 months back. I haven't found release plans for that
>> project or a place to ask about it (filing an issue doesn't seem right to
>> ask this question). So I'll get in touch with the repo owner and see what
>> his plans for the project are.
>>
>> -Jaikiran
>>
>> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
>>>
>>> I did!
>>>
>>> Thanks for clarifying :)
>>>
>>> The client that is part of Zookeeper itself actually does support
>>> timeouts.
>>>
>>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wa...@gmail.com> wrote:
>>>>
>>>> Hi Jaikiran,
>>>>
>>>> I think Gwen was talking about contributing to ZkClient project:
>>>>
>>>> https://github.com/sgroschupf/zkclient
>>>>
>>>> Guozhang
>>>>
>>>>
>>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <ja...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Gwen,
>>>>>
>>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
>>>>> replacement.
>>>>>
>>>>> As for contributing to Zookeeper, yes that indeed in on my mind, but I
>>>>> haven't yet had a chance to really look deeper into Zookeeper or get in
>>>>> touch with their dev team to try and explain this potential improvement
>>>>> to
>>>>> them. I have no objection to contributing this or something similar to
>>>>> Zookeeper directly. I think I should be able to bring this up in the
>>>>> Zookeeper dev forum, sometime soon in the next few weekends.
>>>>>
>>>>> -Jaikiran
>>>>>
>>>>>
>>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>>>>>
>>>>>> It looks like the new KafkaZkClient is a wrapper around ZkClient, but
>>>>>> not a replacement. Did I get it right?
>>>>>>
>>>>>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664
>>>>>> can also use one.
>>>>>>
>>>>>> However, I'm wondering why not contribute the fix directly to ZKClient
>>>>>> project and ask for a release that contains the fix?
>>>>>> This will benefit other users of the project who may also need a
>>>>>> timeout (thats pretty basic...)
>>>>>>
>>>>>> As an alternative, if we don't want to collaborate with ZKClient for
>>>>>> some reason, forking the project into Kafka will probably give us more
>>>>>> control than wrappers and without much downside.
>>>>>>
>>>>>> Just a thought.
>>>>>>
>>>>>> Gwen
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai
>>>>>> <ja...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Neha, Ewen (and others), my initial attempt to solve this is uploaded
>>>>>>> here
>>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown problem
>>>>>>> and
>>>>>>> now
>>>>>>> the server shuts down even when Zookeeper has gone down before the
>>>>>>> Kafka
>>>>>>> server.
>>>>>>>
>>>>>>> I went with the approach of introducing a custom (enhanced) ZkClient
>>>>>>> which
>>>>>>> for now allows time outs to be optionally specified for certain
>>>>>>> operations.
>>>>>>> I intentionally haven't forced the use of this new KafkaZkClient all
>>>>>>> over
>>>>>>> the code and instead for now have just used it in the KafkaServer.
>>>>>>>
>>>>>>> Does this patch look like something worth using?
>>>>>>>
>>>>>>> -Jaikiran
>>>>>>>
>>>>>>>
>>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>>>>>>
>>>>>>>> Ewen is right. ZkClient APIs are blocking and the right fix for this
>>>>>>>> seems
>>>>>>>> to be patching ZkClient. At some point, if we find ourselves
>>>>>>>> fiddling
>>>>>>>> too
>>>>>>>> much with ZkClient, it wouldn't hurt to write our own little
>>>>>>>> zookeeper
>>>>>>>> client wrapper.
>>>>>>>>
>>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>>>>>> <ew...@confluent.io>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>   Looks like a bug to me -- the underlying ZK library wraps a lot of
>>>>>>>>>
>>>>>>>>> blocking
>>>>>>>>> method implementations with waitUntilConnected() calls without any
>>>>>>>>> timeouts. Ideally we could just add a version of
>>>>>>>>> ZkUtils.getController()
>>>>>>>>> with a timeout, but I don't see an easy way to accomplish that with
>>>>>>>>> ZkClient.
>>>>>>>>>
>>>>>>>>> There's at least one other call to ZkUtils besides the one in the
>>>>>>>>> stacktrace you gave that would cause the same issue, possibly more
>>>>>>>>> that
>>>>>>>>> aren't directly called in that method. One ugly solution would be
>>>>>>>>> to
>>>>>>>>> use
>>>>>>>>> an
>>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine
>>>>>>>>> we
>>>>>>>>> probably have other threads that could end up blocking in similar
>>>>>>>>> ways.
>>>>>>>>>
>>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track
>>>>>>>>> the
>>>>>>>>> issue.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>>>>>> jai.forums2013@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>   The main culprit is this thread which goes into "forever retry
>>>>>>>>>>
>>>>>>>>>> connection
>>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C)
>>>>>>>>>> after
>>>>>>>>>> zookeeper has already been shutdown. I have attached the complete
>>>>>>>>>> thread
>>>>>>>>>> dump, but I don't know if it will be delivered to the mailing
>>>>>>>>>> list.
>>>>>>>>>>
>>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>>>>>>>> [0x6ad69000]
>>>>>>>>>>       java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>>>>>        at sun.misc.Unsafe.park(Native Method)
>>>>>>>>>>        - parking to wait for  <0x70a93368> (a
>>>>>>>>>> java.util.concurrent.locks.
>>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>>>>>        at java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>>>>>> LockSupport.java:267)
>>>>>>>>>>        at java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>>>>>>>        at
>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>>>>>>>>>>        at
>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>>>>>>>>>>        at
>>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>>>>>>>>>>        at
>>>>>>>>>>
>>>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>>>>>>>>>
>>>>>>>>>>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>>>>>>>>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>>>>>>>>        at
>>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>>>>>>        at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>>>>>        at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>>>>>        at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
>>>>>>>>>> sp(KafkaServer.scala:269)
>>>>>>>>>>        at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>>>>>        at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>>>>>>>>>>        at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>>>>>>        at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>>>>>        at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>>>>>        at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>>>>>>>>>>        at kafka.server.KafkaServerStartable.shutdown(
>>>>>>>>>> KafkaServerStartable.scala:42)
>>>>>>>>>>        at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>>>>>
>>>>>>>>>> -Jaikiran
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>>>>>>>
>>>>>>>>>>   For a clean shutdown, the broker tries to talk to the controller
>>>>>>>>>> and
>>>>>>>>>> also
>>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
>>>>>>>>>>>
>>>>>>>>>>> reconnect
>>>>>>>>>>>
>>>>>>>>>> to
>>>>>>>>>> zk. It will help to look at the thread dump.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Neha
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
>>>>>>>>>>> jai.forums2013@gmail.com
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>     I was just playing around with the RC2 of 0.8.2 and noticed
>>>>>>>>>>> that
>>>>>>>>>>> if I
>>>>>>>>>>>
>>>>>>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all
>>>>>>>>>>>> since
>>>>>>>>>>>> it
>>>>>>>>>>>> goes
>>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had
>>>>>>>>>>>> to
>>>>>>>>>>>> kill
>>>>>>>>>>>> the
>>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and there
>>>>>>>>>>>> too I
>>>>>>>>>>>> see
>>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if I can
>>>>>>>>>>>> come
>>>>>>>>>>>> up
>>>>>>>>>>>> with
>>>>>>>>>>>> a patch?
>>>>>>>>>>>>
>>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at
>>>>>>>>>>>> trying
>>>>>>>>>>>> to
>>>>>>>>>>>> reconnect. I've a thread dump too which shows that the other
>>>>>>>>>>>> thread
>>>>>>>>>>>>
>>>>>>>>>>> which
>>>>>>>>>>
>>>>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>>>>>>>>
>>>>>>>>>>>> forever
>>>>>>>>>>>> for
>>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>>>>>>
>>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>> server
>>>>>>>>>>>>
>>>>>>>>>>> null,
>>>>>>>>>>
>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>> Method)
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to
>>>>>>>>>>>> server
>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>>>> SASL
>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>> server
>>>>>>>>>>>>
>>>>>>>>>>> null,
>>>>>>>>>>
>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>> Method)
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to
>>>>>>>>>>>> server
>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>>>> SASL
>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>> server
>>>>>>>>>>>>
>>>>>>>>>>> null,
>>>>>>>>>>
>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>> Method)
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to
>>>>>>>>>>>> server
>>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>>>> SASL
>>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for
>>>>>>>>>>>> server
>>>>>>>>>>>>
>>>>>>>>>>> null,
>>>>>>>>>>
>>>>>>>>>> unexpected error, closing socket connection and attempting
>>>>>>>>>> reconnect
>>>>>>>>>>>>
>>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native
>>>>>>>>>>>> Method)
>>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>   --
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ewen
>>>>>>>>>
>>>>>>>>>
>>>>
>>>> --
>>>> -- Guozhang
>>
>>
>

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Jaikiran Pai <ja...@gmail.com>.
I just heard back from Stefan, who manages the ZkClient repo and he 
seems to be open to have these changes be part of ZkClient project. I'll 
be creating a pull request for that project to have it reviewed and 
merged. Although I haven't heard of exact release plans, Stefan's reply 
did indicate that the project could be released after this change is merged.

-Jaikiran
On Tuesday 03 February 2015 09:03 AM, Jaikiran Pai wrote:
> Thanks for pointing to that repo!
>
> I just had a look at it and it appears that the project isn't much 
> active (going by the lack of activity). The latest contribution is 
> from Gwen and that was around 3 months back. I haven't found release 
> plans for that project or a place to ask about it (filing an issue 
> doesn't seem right to ask this question). So I'll get in touch with 
> the repo owner and see what his plans for the project are.
>
> -Jaikiran
>
> On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
>> I did!
>>
>> Thanks for clarifying :)
>>
>> The client that is part of Zookeeper itself actually does support 
>> timeouts.
>>
>> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wa...@gmail.com> 
>> wrote:
>>> Hi Jaikiran,
>>>
>>> I think Gwen was talking about contributing to ZkClient project:
>>>
>>> https://github.com/sgroschupf/zkclient
>>>
>>> Guozhang
>>>
>>>
>>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <ja...@gmail.com>
>>> wrote:
>>>
>>>> Hi Gwen,
>>>>
>>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
>>>> replacement.
>>>>
>>>> As for contributing to Zookeeper, yes that indeed in on my mind, but I
>>>> haven't yet had a chance to really look deeper into Zookeeper or 
>>>> get in
>>>> touch with their dev team to try and explain this potential 
>>>> improvement to
>>>> them. I have no objection to contributing this or something similar to
>>>> Zookeeper directly. I think I should be able to bring this up in the
>>>> Zookeeper dev forum, sometime soon in the next few weekends.
>>>>
>>>> -Jaikiran
>>>>
>>>>
>>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>>>>
>>>>> It looks like the new KafkaZkClient is a wrapper around ZkClient, but
>>>>> not a replacement. Did I get it right?
>>>>>
>>>>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664
>>>>> can also use one.
>>>>>
>>>>> However, I'm wondering why not contribute the fix directly to 
>>>>> ZKClient
>>>>> project and ask for a release that contains the fix?
>>>>> This will benefit other users of the project who may also need a
>>>>> timeout (thats pretty basic...)
>>>>>
>>>>> As an alternative, if we don't want to collaborate with ZKClient for
>>>>> some reason, forking the project into Kafka will probably give us 
>>>>> more
>>>>> control than wrappers and without much downside.
>>>>>
>>>>> Just a thought.
>>>>>
>>>>> Gwen
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai 
>>>>> <ja...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Neha, Ewen (and others), my initial attempt to solve this is 
>>>>>> uploaded
>>>>>> here
>>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown 
>>>>>> problem and
>>>>>> now
>>>>>> the server shuts down even when Zookeeper has gone down before 
>>>>>> the Kafka
>>>>>> server.
>>>>>>
>>>>>> I went with the approach of introducing a custom (enhanced) ZkClient
>>>>>> which
>>>>>> for now allows time outs to be optionally specified for certain
>>>>>> operations.
>>>>>> I intentionally haven't forced the use of this new KafkaZkClient 
>>>>>> all over
>>>>>> the code and instead for now have just used it in the KafkaServer.
>>>>>>
>>>>>> Does this patch look like something worth using?
>>>>>>
>>>>>> -Jaikiran
>>>>>>
>>>>>>
>>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>>>>>
>>>>>>> Ewen is right. ZkClient APIs are blocking and the right fix for 
>>>>>>> this
>>>>>>> seems
>>>>>>> to be patching ZkClient. At some point, if we find ourselves 
>>>>>>> fiddling
>>>>>>> too
>>>>>>> much with ZkClient, it wouldn't hurt to write our own little 
>>>>>>> zookeeper
>>>>>>> client wrapper.
>>>>>>>
>>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>>>>> <ew...@confluent.io>
>>>>>>> wrote:
>>>>>>>
>>>>>>>   Looks like a bug to me -- the underlying ZK library wraps a 
>>>>>>> lot of
>>>>>>>> blocking
>>>>>>>> method implementations with waitUntilConnected() calls without any
>>>>>>>> timeouts. Ideally we could just add a version of
>>>>>>>> ZkUtils.getController()
>>>>>>>> with a timeout, but I don't see an easy way to accomplish that 
>>>>>>>> with
>>>>>>>> ZkClient.
>>>>>>>>
>>>>>>>> There's at least one other call to ZkUtils besides the one in the
>>>>>>>> stacktrace you gave that would cause the same issue, possibly 
>>>>>>>> more that
>>>>>>>> aren't directly called in that method. One ugly solution would 
>>>>>>>> be to
>>>>>>>> use
>>>>>>>> an
>>>>>>>> extra thread during shutdown to trigger timeouts, but I'd 
>>>>>>>> imagine we
>>>>>>>> probably have other threads that could end up blocking in 
>>>>>>>> similar ways.
>>>>>>>>
>>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to 
>>>>>>>> track the
>>>>>>>> issue.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>>>>> jai.forums2013@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>   The main culprit is this thread which goes into "forever retry
>>>>>>>>> connection
>>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) 
>>>>>>>>> after
>>>>>>>>> zookeeper has already been shutdown. I have attached the complete
>>>>>>>>> thread
>>>>>>>>> dump, but I don't know if it will be delivered to the mailing 
>>>>>>>>> list.
>>>>>>>>>
>>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>>>>>>> [0x6ad69000]
>>>>>>>>>       java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>>>>        at sun.misc.Unsafe.park(Native Method)
>>>>>>>>>        - parking to wait for  <0x70a93368> (a
>>>>>>>>> java.util.concurrent.locks.
>>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>>>>        at java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>>>>> LockSupport.java:267)
>>>>>>>>>        at java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>>>>>>        at
>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636) 
>>>>>>>>>
>>>>>>>>>        at
>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619) 
>>>>>>>>>
>>>>>>>>>        at
>>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615) 
>>>>>>>>>
>>>>>>>>>        at
>>>>>>>>>
>>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679) 
>>>>>>>>
>>>>>>>>
>>>>>>>>>        at 
>>>>>>>>> org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>>>>>>>        at 
>>>>>>>>> org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>>>>>>>        at 
>>>>>>>>> kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>>>>>        at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>>>>        at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>>>>        at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
>>>>>>>>> sp(KafkaServer.scala:269)
>>>>>>>>>        at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>>>>        at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>>>>>>>>>        at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>>>>>        at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>>>>        at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>>>>        at 
>>>>>>>>> kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>>>>>>>>>        at kafka.server.KafkaServerStartable.shutdown(
>>>>>>>>> KafkaServerStartable.scala:42)
>>>>>>>>>        at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>>>>
>>>>>>>>> -Jaikiran
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>>>>>>
>>>>>>>>>   For a clean shutdown, the broker tries to talk to the 
>>>>>>>>> controller and
>>>>>>>>> also
>>>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
>>>>>>>>>> reconnect
>>>>>>>>>>
>>>>>>>>> to
>>>>>>>>> zk. It will help to look at the thread dump.
>>>>>>>>>> Thanks
>>>>>>>>>> Neha
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
>>>>>>>>>> jai.forums2013@gmail.com
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>     I was just playing around with the RC2 of 0.8.2 and 
>>>>>>>>>> noticed that
>>>>>>>>>> if I
>>>>>>>>>>
>>>>>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at 
>>>>>>>>>>> all since
>>>>>>>>>>> it
>>>>>>>>>>> goes
>>>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I 
>>>>>>>>>>> had to
>>>>>>>>>>> kill
>>>>>>>>>>> the
>>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and 
>>>>>>>>>>> there
>>>>>>>>>>> too I
>>>>>>>>>>> see
>>>>>>>>>>> the same issue. Should I file a JIRA for this and see if I 
>>>>>>>>>>> can come
>>>>>>>>>>> up
>>>>>>>>>>> with
>>>>>>>>>>> a patch?
>>>>>>>>>>>
>>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at 
>>>>>>>>>>> trying
>>>>>>>>>>> to
>>>>>>>>>>> reconnect. I've a thread dump too which shows that the other 
>>>>>>>>>>> thread
>>>>>>>>>>>
>>>>>>>>>> which
>>>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>>>>>>> forever
>>>>>>>>>>> for
>>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>>>>>
>>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for 
>>>>>>>>>>> server
>>>>>>>>>>>
>>>>>>>>>> null,
>>>>>>>>> unexpected error, closing socket connection and attempting 
>>>>>>>>> reconnect
>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native 
>>>>>>>>>>> Method)
>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>         at 
>>>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to 
>>>>>>>>>>> server
>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate 
>>>>>>>>>>> using
>>>>>>>>>>> SASL
>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for 
>>>>>>>>>>> server
>>>>>>>>>>>
>>>>>>>>>> null,
>>>>>>>>> unexpected error, closing socket connection and attempting 
>>>>>>>>> reconnect
>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native 
>>>>>>>>>>> Method)
>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>         at 
>>>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to 
>>>>>>>>>>> server
>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate 
>>>>>>>>>>> using
>>>>>>>>>>> SASL
>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for 
>>>>>>>>>>> server
>>>>>>>>>>>
>>>>>>>>>> null,
>>>>>>>>> unexpected error, closing socket connection and attempting 
>>>>>>>>> reconnect
>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native 
>>>>>>>>>>> Method)
>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>         at 
>>>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to 
>>>>>>>>>>> server
>>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate 
>>>>>>>>>>> using
>>>>>>>>>>> SASL
>>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for 
>>>>>>>>>>> server
>>>>>>>>>>>
>>>>>>>>>> null,
>>>>>>>>> unexpected error, closing socket connection and attempting 
>>>>>>>>> reconnect
>>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native 
>>>>>>>>>>> Method)
>>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>>         at 
>>>>>>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Jaikiran
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>   --
>>>>>>>> Thanks,
>>>>>>>> Ewen
>>>>>>>>
>>>>>>>>
>>>
>>> -- 
>>> -- Guozhang
>


Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Jaikiran Pai <ja...@gmail.com>.
Thanks for pointing to that repo!

I just had a look at it and it appears that the project isn't much 
active (going by the lack of activity). The latest contribution is from 
Gwen and that was around 3 months back. I haven't found release plans 
for that project or a place to ask about it (filing an issue doesn't 
seem right to ask this question). So I'll get in touch with the repo 
owner and see what his plans for the project are.

-Jaikiran

On Monday 02 February 2015 11:33 PM, Gwen Shapira wrote:
> I did!
>
> Thanks for clarifying :)
>
> The client that is part of Zookeeper itself actually does support timeouts.
>
> On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wa...@gmail.com> wrote:
>> Hi Jaikiran,
>>
>> I think Gwen was talking about contributing to ZkClient project:
>>
>> https://github.com/sgroschupf/zkclient
>>
>> Guozhang
>>
>>
>> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <ja...@gmail.com>
>> wrote:
>>
>>> Hi Gwen,
>>>
>>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
>>> replacement.
>>>
>>> As for contributing to Zookeeper, yes that indeed in on my mind, but I
>>> haven't yet had a chance to really look deeper into Zookeeper or get in
>>> touch with their dev team to try and explain this potential improvement to
>>> them. I have no objection to contributing this or something similar to
>>> Zookeeper directly. I think I should be able to bring this up in the
>>> Zookeeper dev forum, sometime soon in the next few weekends.
>>>
>>> -Jaikiran
>>>
>>>
>>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>>>
>>>> It looks like the new KafkaZkClient is a wrapper around ZkClient, but
>>>> not a replacement. Did I get it right?
>>>>
>>>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664
>>>> can also use one.
>>>>
>>>> However, I'm wondering why not contribute the fix directly to ZKClient
>>>> project and ask for a release that contains the fix?
>>>> This will benefit other users of the project who may also need a
>>>> timeout (thats pretty basic...)
>>>>
>>>> As an alternative, if we don't want to collaborate with ZKClient for
>>>> some reason, forking the project into Kafka will probably give us more
>>>> control than wrappers and without much downside.
>>>>
>>>> Just a thought.
>>>>
>>>> Gwen
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai <ja...@gmail.com>
>>>> wrote:
>>>>
>>>>> Neha, Ewen (and others), my initial attempt to solve this is uploaded
>>>>> here
>>>>> https://reviews.apache.org/r/30477/. It solves the shutdown problem and
>>>>> now
>>>>> the server shuts down even when Zookeeper has gone down before the Kafka
>>>>> server.
>>>>>
>>>>> I went with the approach of introducing a custom (enhanced) ZkClient
>>>>> which
>>>>> for now allows time outs to be optionally specified for certain
>>>>> operations.
>>>>> I intentionally haven't forced the use of this new KafkaZkClient all over
>>>>> the code and instead for now have just used it in the KafkaServer.
>>>>>
>>>>> Does this patch look like something worth using?
>>>>>
>>>>> -Jaikiran
>>>>>
>>>>>
>>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>>>>
>>>>>> Ewen is right. ZkClient APIs are blocking and the right fix for this
>>>>>> seems
>>>>>> to be patching ZkClient. At some point, if we find ourselves fiddling
>>>>>> too
>>>>>> much with ZkClient, it wouldn't hurt to write our own little zookeeper
>>>>>> client wrapper.
>>>>>>
>>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>>>> <ew...@confluent.io>
>>>>>> wrote:
>>>>>>
>>>>>>   Looks like a bug to me -- the underlying ZK library wraps a lot of
>>>>>>> blocking
>>>>>>> method implementations with waitUntilConnected() calls without any
>>>>>>> timeouts. Ideally we could just add a version of
>>>>>>> ZkUtils.getController()
>>>>>>> with a timeout, but I don't see an easy way to accomplish that with
>>>>>>> ZkClient.
>>>>>>>
>>>>>>> There's at least one other call to ZkUtils besides the one in the
>>>>>>> stacktrace you gave that would cause the same issue, possibly more that
>>>>>>> aren't directly called in that method. One ugly solution would be to
>>>>>>> use
>>>>>>> an
>>>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine we
>>>>>>> probably have other threads that could end up blocking in similar ways.
>>>>>>>
>>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track the
>>>>>>> issue.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>>>> jai.forums2013@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>   The main culprit is this thread which goes into "forever retry
>>>>>>>> connection
>>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) after
>>>>>>>> zookeeper has already been shutdown. I have attached the complete
>>>>>>>> thread
>>>>>>>> dump, but I don't know if it will be delivered to the mailing list.
>>>>>>>>
>>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>>>>>> [0x6ad69000]
>>>>>>>>       java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>>>        at sun.misc.Unsafe.park(Native Method)
>>>>>>>>        - parking to wait for  <0x70a93368> (a
>>>>>>>> java.util.concurrent.locks.
>>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>>>        at java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>>>> LockSupport.java:267)
>>>>>>>>        at java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>>>>>        at
>>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>>>>>>>>        at
>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>>>>>>>>        at
>>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>>>>>>>>        at
>>>>>>>>
>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>>>>>>>
>>>>>>>>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>>>>>>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>>>>>>        at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>>>>        at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>>>        at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>>>        at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
>>>>>>>> sp(KafkaServer.scala:269)
>>>>>>>>        at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>>>        at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>>>>>>>>        at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>>>>        at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>>>        at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>>>        at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>>>>>>>>        at kafka.server.KafkaServerStartable.shutdown(
>>>>>>>> KafkaServerStartable.scala:42)
>>>>>>>>        at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>>>
>>>>>>>> -Jaikiran
>>>>>>>>
>>>>>>>>
>>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>>>>>
>>>>>>>>   For a clean shutdown, the broker tries to talk to the controller and
>>>>>>>> also
>>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
>>>>>>>>> reconnect
>>>>>>>>>
>>>>>>>> to
>>>>>>>> zk. It will help to look at the thread dump.
>>>>>>>>> Thanks
>>>>>>>>> Neha
>>>>>>>>>
>>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
>>>>>>>>> jai.forums2013@gmail.com
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>     I was just playing around with the RC2 of 0.8.2 and noticed that
>>>>>>>>> if I
>>>>>>>>>
>>>>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all since
>>>>>>>>>> it
>>>>>>>>>> goes
>>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had to
>>>>>>>>>> kill
>>>>>>>>>> the
>>>>>>>>>> Kafka process to stop it. I tried it against trunk too and there
>>>>>>>>>> too I
>>>>>>>>>> see
>>>>>>>>>> the same issue. Should I file a JIRA for this and see if I can come
>>>>>>>>>> up
>>>>>>>>>> with
>>>>>>>>>> a patch?
>>>>>>>>>>
>>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at trying
>>>>>>>>>> to
>>>>>>>>>> reconnect. I've a thread dump too which shows that the other thread
>>>>>>>>>>
>>>>>>>>> which
>>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>>>>>> forever
>>>>>>>>>> for
>>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>>>>
>>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server
>>>>>>>>>>
>>>>>>>>> null,
>>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to server
>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>> SASL
>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server
>>>>>>>>>>
>>>>>>>>> null,
>>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to server
>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>> SASL
>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server
>>>>>>>>>>
>>>>>>>>> null,
>>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to server
>>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>>> SASL
>>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server
>>>>>>>>>>
>>>>>>>>> null,
>>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>>         at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>>         at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Jaikiran
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>   --
>>>>>>> Thanks,
>>>>>>> Ewen
>>>>>>>
>>>>>>>
>>
>> --
>> -- Guozhang


Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Gwen Shapira <gs...@cloudera.com>.
I did!

Thanks for clarifying :)

The client that is part of Zookeeper itself actually does support timeouts.

On Mon, Feb 2, 2015 at 9:54 AM, Guozhang Wang <wa...@gmail.com> wrote:
> Hi Jaikiran,
>
> I think Gwen was talking about contributing to ZkClient project:
>
> https://github.com/sgroschupf/zkclient
>
> Guozhang
>
>
> On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <ja...@gmail.com>
> wrote:
>
>> Hi Gwen,
>>
>> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
>> replacement.
>>
>> As for contributing to Zookeeper, yes that indeed in on my mind, but I
>> haven't yet had a chance to really look deeper into Zookeeper or get in
>> touch with their dev team to try and explain this potential improvement to
>> them. I have no objection to contributing this or something similar to
>> Zookeeper directly. I think I should be able to bring this up in the
>> Zookeeper dev forum, sometime soon in the next few weekends.
>>
>> -Jaikiran
>>
>>
>> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>>
>>> It looks like the new KafkaZkClient is a wrapper around ZkClient, but
>>> not a replacement. Did I get it right?
>>>
>>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664
>>> can also use one.
>>>
>>> However, I'm wondering why not contribute the fix directly to ZKClient
>>> project and ask for a release that contains the fix?
>>> This will benefit other users of the project who may also need a
>>> timeout (thats pretty basic...)
>>>
>>> As an alternative, if we don't want to collaborate with ZKClient for
>>> some reason, forking the project into Kafka will probably give us more
>>> control than wrappers and without much downside.
>>>
>>> Just a thought.
>>>
>>> Gwen
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai <ja...@gmail.com>
>>> wrote:
>>>
>>>> Neha, Ewen (and others), my initial attempt to solve this is uploaded
>>>> here
>>>> https://reviews.apache.org/r/30477/. It solves the shutdown problem and
>>>> now
>>>> the server shuts down even when Zookeeper has gone down before the Kafka
>>>> server.
>>>>
>>>> I went with the approach of introducing a custom (enhanced) ZkClient
>>>> which
>>>> for now allows time outs to be optionally specified for certain
>>>> operations.
>>>> I intentionally haven't forced the use of this new KafkaZkClient all over
>>>> the code and instead for now have just used it in the KafkaServer.
>>>>
>>>> Does this patch look like something worth using?
>>>>
>>>> -Jaikiran
>>>>
>>>>
>>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>>>
>>>>> Ewen is right. ZkClient APIs are blocking and the right fix for this
>>>>> seems
>>>>> to be patching ZkClient. At some point, if we find ourselves fiddling
>>>>> too
>>>>> much with ZkClient, it wouldn't hurt to write our own little zookeeper
>>>>> client wrapper.
>>>>>
>>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>>> <ew...@confluent.io>
>>>>> wrote:
>>>>>
>>>>>  Looks like a bug to me -- the underlying ZK library wraps a lot of
>>>>>> blocking
>>>>>> method implementations with waitUntilConnected() calls without any
>>>>>> timeouts. Ideally we could just add a version of
>>>>>> ZkUtils.getController()
>>>>>> with a timeout, but I don't see an easy way to accomplish that with
>>>>>> ZkClient.
>>>>>>
>>>>>> There's at least one other call to ZkUtils besides the one in the
>>>>>> stacktrace you gave that would cause the same issue, possibly more that
>>>>>> aren't directly called in that method. One ugly solution would be to
>>>>>> use
>>>>>> an
>>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine we
>>>>>> probably have other threads that could end up blocking in similar ways.
>>>>>>
>>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track the
>>>>>> issue.
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>>> jai.forums2013@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>  The main culprit is this thread which goes into "forever retry
>>>>>>> connection
>>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) after
>>>>>>> zookeeper has already been shutdown. I have attached the complete
>>>>>>> thread
>>>>>>> dump, but I don't know if it will be delivered to the mailing list.
>>>>>>>
>>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>>>>> [0x6ad69000]
>>>>>>>      java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>>       at sun.misc.Unsafe.park(Native Method)
>>>>>>>       - parking to wait for  <0x70a93368> (a
>>>>>>> java.util.concurrent.locks.
>>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>>       at java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>>> LockSupport.java:267)
>>>>>>>       at java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>>>>       at
>>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>>>>>>>       at
>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>>>>>>>       at
>>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>>>>>>>       at
>>>>>>>
>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>>>>>>
>>>>>>>       at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>>>>>       at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>>>>>       at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>>>       at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>>       at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>>       at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
>>>>>>> sp(KafkaServer.scala:269)
>>>>>>>       at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>>       at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>>>>>>>       at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>>>       at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>>       at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>>       at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>>>>>>>       at kafka.server.KafkaServerStartable.shutdown(
>>>>>>> KafkaServerStartable.scala:42)
>>>>>>>       at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>>
>>>>>>> -Jaikiran
>>>>>>>
>>>>>>>
>>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>>>>
>>>>>>>  For a clean shutdown, the broker tries to talk to the controller and
>>>>>>>>
>>>>>>> also
>>>>>>
>>>>>>> issues reads to zookeeper. Possibly that is where it tries to
>>>>>>>> reconnect
>>>>>>>>
>>>>>>> to
>>>>>>
>>>>>>> zk. It will help to look at the thread dump.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Neha
>>>>>>>>
>>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
>>>>>>>> jai.forums2013@gmail.com
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>    I was just playing around with the RC2 of 0.8.2 and noticed that
>>>>>>>> if I
>>>>>>>>
>>>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all since
>>>>>>>>> it
>>>>>>>>> goes
>>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had to
>>>>>>>>> kill
>>>>>>>>> the
>>>>>>>>> Kafka process to stop it. I tried it against trunk too and there
>>>>>>>>> too I
>>>>>>>>> see
>>>>>>>>> the same issue. Should I file a JIRA for this and see if I can come
>>>>>>>>> up
>>>>>>>>> with
>>>>>>>>> a patch?
>>>>>>>>>
>>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at trying
>>>>>>>>> to
>>>>>>>>> reconnect. I've a thread dump too which shows that the other thread
>>>>>>>>>
>>>>>>>> which
>>>>>>
>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>>>>> forever
>>>>>>>>> for
>>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>>>
>>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server
>>>>>>>>>
>>>>>>>> null,
>>>>>>
>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to server
>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>> SASL
>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server
>>>>>>>>>
>>>>>>>> null,
>>>>>>
>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to server
>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>> SASL
>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server
>>>>>>>>>
>>>>>>>> null,
>>>>>>
>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to server
>>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>>> SASL
>>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server
>>>>>>>>>
>>>>>>>> null,
>>>>>>
>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Jaikiran
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>> Thanks,
>>>>>> Ewen
>>>>>>
>>>>>>
>>>>>
>>
>
>
> --
> -- Guozhang

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Guozhang Wang <wa...@gmail.com>.
Hi Jaikiran,

I think Gwen was talking about contributing to ZkClient project:

https://github.com/sgroschupf/zkclient

Guozhang


On Sun, Feb 1, 2015 at 5:30 AM, Jaikiran Pai <ja...@gmail.com>
wrote:

> Hi Gwen,
>
> Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete
> replacement.
>
> As for contributing to Zookeeper, yes that indeed in on my mind, but I
> haven't yet had a chance to really look deeper into Zookeeper or get in
> touch with their dev team to try and explain this potential improvement to
> them. I have no objection to contributing this or something similar to
> Zookeeper directly. I think I should be able to bring this up in the
> Zookeeper dev forum, sometime soon in the next few weekends.
>
> -Jaikiran
>
>
> On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
>
>> It looks like the new KafkaZkClient is a wrapper around ZkClient, but
>> not a replacement. Did I get it right?
>>
>> I think a wrapper for ZkClient can be useful - for example KAFKA-1664
>> can also use one.
>>
>> However, I'm wondering why not contribute the fix directly to ZKClient
>> project and ask for a release that contains the fix?
>> This will benefit other users of the project who may also need a
>> timeout (thats pretty basic...)
>>
>> As an alternative, if we don't want to collaborate with ZKClient for
>> some reason, forking the project into Kafka will probably give us more
>> control than wrappers and without much downside.
>>
>> Just a thought.
>>
>> Gwen
>>
>>
>>
>>
>>
>> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai <ja...@gmail.com>
>> wrote:
>>
>>> Neha, Ewen (and others), my initial attempt to solve this is uploaded
>>> here
>>> https://reviews.apache.org/r/30477/. It solves the shutdown problem and
>>> now
>>> the server shuts down even when Zookeeper has gone down before the Kafka
>>> server.
>>>
>>> I went with the approach of introducing a custom (enhanced) ZkClient
>>> which
>>> for now allows time outs to be optionally specified for certain
>>> operations.
>>> I intentionally haven't forced the use of this new KafkaZkClient all over
>>> the code and instead for now have just used it in the KafkaServer.
>>>
>>> Does this patch look like something worth using?
>>>
>>> -Jaikiran
>>>
>>>
>>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>>
>>>> Ewen is right. ZkClient APIs are blocking and the right fix for this
>>>> seems
>>>> to be patching ZkClient. At some point, if we find ourselves fiddling
>>>> too
>>>> much with ZkClient, it wouldn't hurt to write our own little zookeeper
>>>> client wrapper.
>>>>
>>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>>> <ew...@confluent.io>
>>>> wrote:
>>>>
>>>>  Looks like a bug to me -- the underlying ZK library wraps a lot of
>>>>> blocking
>>>>> method implementations with waitUntilConnected() calls without any
>>>>> timeouts. Ideally we could just add a version of
>>>>> ZkUtils.getController()
>>>>> with a timeout, but I don't see an easy way to accomplish that with
>>>>> ZkClient.
>>>>>
>>>>> There's at least one other call to ZkUtils besides the one in the
>>>>> stacktrace you gave that would cause the same issue, possibly more that
>>>>> aren't directly called in that method. One ugly solution would be to
>>>>> use
>>>>> an
>>>>> extra thread during shutdown to trigger timeouts, but I'd imagine we
>>>>> probably have other threads that could end up blocking in similar ways.
>>>>>
>>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track the
>>>>> issue.
>>>>>
>>>>>
>>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <
>>>>> jai.forums2013@gmail.com>
>>>>> wrote:
>>>>>
>>>>>  The main culprit is this thread which goes into "forever retry
>>>>>> connection
>>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) after
>>>>>> zookeeper has already been shutdown. I have attached the complete
>>>>>> thread
>>>>>> dump, but I don't know if it will be delivered to the mailing list.
>>>>>>
>>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>>>> [0x6ad69000]
>>>>>>      java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>>       at sun.misc.Unsafe.park(Native Method)
>>>>>>       - parking to wait for  <0x70a93368> (a
>>>>>> java.util.concurrent.locks.
>>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>>       at java.util.concurrent.locks.LockSupport.parkUntil(
>>>>>> LockSupport.java:267)
>>>>>>       at java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>>>       at
>>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>>>>>>       at
>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>>>>>>       at
>>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>>>>>>       at
>>>>>>
>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>>>>>
>>>>>>       at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>>>>       at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>>>>       at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>>       at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>>       at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>>       at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
>>>>>> sp(KafkaServer.scala:269)
>>>>>>       at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>>       at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>>>>>>       at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>>       at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>>       at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>>       at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>>>>>>       at kafka.server.KafkaServerStartable.shutdown(
>>>>>> KafkaServerStartable.scala:42)
>>>>>>       at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>>
>>>>>> -Jaikiran
>>>>>>
>>>>>>
>>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>>>
>>>>>>  For a clean shutdown, the broker tries to talk to the controller and
>>>>>>>
>>>>>> also
>>>>>
>>>>>> issues reads to zookeeper. Possibly that is where it tries to
>>>>>>> reconnect
>>>>>>>
>>>>>> to
>>>>>
>>>>>> zk. It will help to look at the thread dump.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Neha
>>>>>>>
>>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <
>>>>>>> jai.forums2013@gmail.com
>>>>>>> wrote:
>>>>>>>
>>>>>>>    I was just playing around with the RC2 of 0.8.2 and noticed that
>>>>>>> if I
>>>>>>>
>>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all since
>>>>>>>> it
>>>>>>>> goes
>>>>>>>> into a never ending attempt to reconnect with zookeeper. I had to
>>>>>>>> kill
>>>>>>>> the
>>>>>>>> Kafka process to stop it. I tried it against trunk too and there
>>>>>>>> too I
>>>>>>>> see
>>>>>>>> the same issue. Should I file a JIRA for this and see if I can come
>>>>>>>> up
>>>>>>>> with
>>>>>>>> a patch?
>>>>>>>>
>>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at trying
>>>>>>>> to
>>>>>>>> reconnect. I've a thread dump too which shows that the other thread
>>>>>>>>
>>>>>>> which
>>>>>
>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>>>> forever
>>>>>>>> for
>>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>>
>>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server
>>>>>>>>
>>>>>>> null,
>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>> ClientCnxn.java:1081)
>>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to server
>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>> SASL
>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server
>>>>>>>>
>>>>>>> null,
>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>> ClientCnxn.java:1081)
>>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to server
>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>> SASL
>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server
>>>>>>>>
>>>>>>> null,
>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>> ClientCnxn.java:1081)
>>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to server
>>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using
>>>>>>>> SASL
>>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server
>>>>>>>>
>>>>>>> null,
>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>>> SocketChannelImpl.java:739)
>>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>>> ClientCnxn.java:1081)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -Jaikiran
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>> Thanks,
>>>>> Ewen
>>>>>
>>>>>
>>>>
>


-- 
-- Guozhang

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Jaikiran Pai <ja...@gmail.com>.
Hi Gwen,

Yes, the KafkaZkClient is a wrapper around ZkClient and not a complete 
replacement.

As for contributing to Zookeeper, yes that indeed in on my mind, but I 
haven't yet had a chance to really look deeper into Zookeeper or get in 
touch with their dev team to try and explain this potential improvement 
to them. I have no objection to contributing this or something similar 
to Zookeeper directly. I think I should be able to bring this up in the 
Zookeeper dev forum, sometime soon in the next few weekends.

-Jaikiran

On Sunday 01 February 2015 11:40 AM, Gwen Shapira wrote:
> It looks like the new KafkaZkClient is a wrapper around ZkClient, but
> not a replacement. Did I get it right?
>
> I think a wrapper for ZkClient can be useful - for example KAFKA-1664
> can also use one.
>
> However, I'm wondering why not contribute the fix directly to ZKClient
> project and ask for a release that contains the fix?
> This will benefit other users of the project who may also need a
> timeout (thats pretty basic...)
>
> As an alternative, if we don't want to collaborate with ZKClient for
> some reason, forking the project into Kafka will probably give us more
> control than wrappers and without much downside.
>
> Just a thought.
>
> Gwen
>
>
>
>
>
> On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai <ja...@gmail.com> wrote:
>> Neha, Ewen (and others), my initial attempt to solve this is uploaded here
>> https://reviews.apache.org/r/30477/. It solves the shutdown problem and now
>> the server shuts down even when Zookeeper has gone down before the Kafka
>> server.
>>
>> I went with the approach of introducing a custom (enhanced) ZkClient which
>> for now allows time outs to be optionally specified for certain operations.
>> I intentionally haven't forced the use of this new KafkaZkClient all over
>> the code and instead for now have just used it in the KafkaServer.
>>
>> Does this patch look like something worth using?
>>
>> -Jaikiran
>>
>>
>> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>> Ewen is right. ZkClient APIs are blocking and the right fix for this seems
>>> to be patching ZkClient. At some point, if we find ourselves fiddling too
>>> much with ZkClient, it wouldn't hurt to write our own little zookeeper
>>> client wrapper.
>>>
>>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>>> <ew...@confluent.io>
>>> wrote:
>>>
>>>> Looks like a bug to me -- the underlying ZK library wraps a lot of
>>>> blocking
>>>> method implementations with waitUntilConnected() calls without any
>>>> timeouts. Ideally we could just add a version of ZkUtils.getController()
>>>> with a timeout, but I don't see an easy way to accomplish that with
>>>> ZkClient.
>>>>
>>>> There's at least one other call to ZkUtils besides the one in the
>>>> stacktrace you gave that would cause the same issue, possibly more that
>>>> aren't directly called in that method. One ugly solution would be to use
>>>> an
>>>> extra thread during shutdown to trigger timeouts, but I'd imagine we
>>>> probably have other threads that could end up blocking in similar ways.
>>>>
>>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track the
>>>> issue.
>>>>
>>>>
>>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <ja...@gmail.com>
>>>> wrote:
>>>>
>>>>> The main culprit is this thread which goes into "forever retry
>>>>> connection
>>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) after
>>>>> zookeeper has already been shutdown. I have attached the complete thread
>>>>> dump, but I don't know if it will be delivered to the mailing list.
>>>>>
>>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>>> [0x6ad69000]
>>>>>      java.lang.Thread.State: TIMED_WAITING (parking)
>>>>>       at sun.misc.Unsafe.park(Native Method)
>>>>>       - parking to wait for  <0x70a93368> (a java.util.concurrent.locks.
>>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>>       at java.util.concurrent.locks.LockSupport.parkUntil(
>>>>> LockSupport.java:267)
>>>>>       at java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>>       at
>>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>>>>>       at
>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>>>>>       at
>>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>>>>>       at
>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>>>>>       at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>>>       at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>>>       at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>>       at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>>       at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>>> controlledShutdown(KafkaServer.scala:194)
>>>>>       at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
>>>>> sp(KafkaServer.scala:269)
>>>>>       at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>>       at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>>>>>       at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>>       at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>>       at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>>       at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>>>>>       at kafka.server.KafkaServerStartable.shutdown(
>>>>> KafkaServerStartable.scala:42)
>>>>>       at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>>
>>>>> -Jaikiran
>>>>>
>>>>>
>>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>>
>>>>>> For a clean shutdown, the broker tries to talk to the controller and
>>>> also
>>>>>> issues reads to zookeeper. Possibly that is where it tries to reconnect
>>>> to
>>>>>> zk. It will help to look at the thread dump.
>>>>>>
>>>>>> Thanks
>>>>>> Neha
>>>>>>
>>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <jai.forums2013@gmail.com
>>>>>> wrote:
>>>>>>
>>>>>>    I was just playing around with the RC2 of 0.8.2 and noticed that if I
>>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all since it
>>>>>>> goes
>>>>>>> into a never ending attempt to reconnect with zookeeper. I had to kill
>>>>>>> the
>>>>>>> Kafka process to stop it. I tried it against trunk too and there too I
>>>>>>> see
>>>>>>> the same issue. Should I file a JIRA for this and see if I can come up
>>>>>>> with
>>>>>>> a patch?
>>>>>>>
>>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at trying to
>>>>>>> reconnect. I've a thread dump too which shows that the other thread
>>>> which
>>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>>> forever
>>>>>>> for
>>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>>
>>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server
>>>> null,
>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>> SocketChannelImpl.java:739)
>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>> ClientCnxn.java:1081)
>>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to server
>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server
>>>> null,
>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>> SocketChannelImpl.java:739)
>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>> ClientCnxn.java:1081)
>>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to server
>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server
>>>> null,
>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>> SocketChannelImpl.java:739)
>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>> ClientCnxn.java:1081)
>>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to server
>>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server
>>>> null,
>>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>>        at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>>> SocketChannelImpl.java:739)
>>>>>>>        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>>        at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>>> ClientCnxn.java:1081)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -Jaikiran
>>>>>>>
>>>>>>>
>>>> --
>>>> Thanks,
>>>> Ewen
>>>>
>>>


Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Gwen Shapira <gs...@cloudera.com>.
It looks like the new KafkaZkClient is a wrapper around ZkClient, but
not a replacement. Did I get it right?

I think a wrapper for ZkClient can be useful - for example KAFKA-1664
can also use one.

However, I'm wondering why not contribute the fix directly to ZKClient
project and ask for a release that contains the fix?
This will benefit other users of the project who may also need a
timeout (thats pretty basic...)

As an alternative, if we don't want to collaborate with ZKClient for
some reason, forking the project into Kafka will probably give us more
control than wrappers and without much downside.

Just a thought.

Gwen





On Sat, Jan 31, 2015 at 6:32 AM, Jaikiran Pai <ja...@gmail.com> wrote:
> Neha, Ewen (and others), my initial attempt to solve this is uploaded here
> https://reviews.apache.org/r/30477/. It solves the shutdown problem and now
> the server shuts down even when Zookeeper has gone down before the Kafka
> server.
>
> I went with the approach of introducing a custom (enhanced) ZkClient which
> for now allows time outs to be optionally specified for certain operations.
> I intentionally haven't forced the use of this new KafkaZkClient all over
> the code and instead for now have just used it in the KafkaServer.
>
> Does this patch look like something worth using?
>
> -Jaikiran
>
>
> On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
>>
>> Ewen is right. ZkClient APIs are blocking and the right fix for this seems
>> to be patching ZkClient. At some point, if we find ourselves fiddling too
>> much with ZkClient, it wouldn't hurt to write our own little zookeeper
>> client wrapper.
>>
>> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava
>> <ew...@confluent.io>
>> wrote:
>>
>>> Looks like a bug to me -- the underlying ZK library wraps a lot of
>>> blocking
>>> method implementations with waitUntilConnected() calls without any
>>> timeouts. Ideally we could just add a version of ZkUtils.getController()
>>> with a timeout, but I don't see an easy way to accomplish that with
>>> ZkClient.
>>>
>>> There's at least one other call to ZkUtils besides the one in the
>>> stacktrace you gave that would cause the same issue, possibly more that
>>> aren't directly called in that method. One ugly solution would be to use
>>> an
>>> extra thread during shutdown to trigger timeouts, but I'd imagine we
>>> probably have other threads that could end up blocking in similar ways.
>>>
>>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track the
>>> issue.
>>>
>>>
>>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <ja...@gmail.com>
>>> wrote:
>>>
>>>> The main culprit is this thread which goes into "forever retry
>>>> connection
>>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) after
>>>> zookeeper has already been shutdown. I have attached the complete thread
>>>> dump, but I don't know if it will be delivered to the mailing list.
>>>>
>>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>>> [0x6ad69000]
>>>>     java.lang.Thread.State: TIMED_WAITING (parking)
>>>>      at sun.misc.Unsafe.park(Native Method)
>>>>      - parking to wait for  <0x70a93368> (a java.util.concurrent.locks.
>>>> AbstractQueuedSynchronizer$ConditionObject)
>>>>      at java.util.concurrent.locks.LockSupport.parkUntil(
>>>> LockSupport.java:267)
>>>>      at java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>>      at
>>>> org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>>>>      at
>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>>>>      at
>>>> org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>>>>      at
>>>
>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>>>>
>>>>      at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>>      at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>>      at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>>      at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>>      at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>>> controlledShutdown(KafkaServer.scala:194)
>>>>      at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
>>>> sp(KafkaServer.scala:269)
>>>>      at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>>      at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>>>>      at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>>      at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>>      at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>>      at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>>>>      at kafka.server.KafkaServerStartable.shutdown(
>>>> KafkaServerStartable.scala:42)
>>>>      at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>>
>>>> -Jaikiran
>>>>
>>>>
>>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>>
>>>>> For a clean shutdown, the broker tries to talk to the controller and
>>>
>>> also
>>>>>
>>>>> issues reads to zookeeper. Possibly that is where it tries to reconnect
>>>
>>> to
>>>>>
>>>>> zk. It will help to look at the thread dump.
>>>>>
>>>>> Thanks
>>>>> Neha
>>>>>
>>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <jai.forums2013@gmail.com
>>>>> wrote:
>>>>>
>>>>>   I was just playing around with the RC2 of 0.8.2 and noticed that if I
>>>>>>
>>>>>> shutdown zookeeper first I can't shutdown Kafka server at all since it
>>>>>> goes
>>>>>> into a never ending attempt to reconnect with zookeeper. I had to kill
>>>>>> the
>>>>>> Kafka process to stop it. I tried it against trunk too and there too I
>>>>>> see
>>>>>> the same issue. Should I file a JIRA for this and see if I can come up
>>>>>> with
>>>>>> a patch?
>>>>>>
>>>>>> FWIW, here's the unending (and IMO too frequent) attempts at trying to
>>>>>> reconnect. I've a thread dump too which shows that the other thread
>>>
>>> which
>>>>>>
>>>>>> is trying to complete a controlled shutdown of Kafka is blocked
>>>>>> forever
>>>>>> for
>>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>>
>>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server
>>>
>>> null,
>>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>> java.net.ConnectException: Connection refused
>>>>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>       at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>> SocketChannelImpl.java:739)
>>>>>>       at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>       at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>> ClientCnxn.java:1081)
>>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to server
>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server
>>>
>>> null,
>>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>> java.net.ConnectException: Connection refused
>>>>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>       at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>> SocketChannelImpl.java:739)
>>>>>>       at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>       at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>> ClientCnxn.java:1081)
>>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to server
>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server
>>>
>>> null,
>>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>> java.net.ConnectException: Connection refused
>>>>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>       at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>> SocketChannelImpl.java:739)
>>>>>>       at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>       at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>> ClientCnxn.java:1081)
>>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to server
>>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server
>>>
>>> null,
>>>>>>
>>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>>> java.net.ConnectException: Connection refused
>>>>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>>       at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>>> SocketChannelImpl.java:739)
>>>>>>       at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>>> ClientCnxnSocketNIO.java:361)
>>>>>>       at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>>> ClientCnxn.java:1081)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Jaikiran
>>>>>>
>>>>>>
>>>>>
>>>
>>> --
>>> Thanks,
>>> Ewen
>>>
>>
>>
>

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Jaikiran Pai <ja...@gmail.com>.
Neha, Ewen (and others), my initial attempt to solve this is uploaded 
here https://reviews.apache.org/r/30477/. It solves the shutdown problem 
and now the server shuts down even when Zookeeper has gone down before 
the Kafka server.

I went with the approach of introducing a custom (enhanced) ZkClient 
which for now allows time outs to be optionally specified for certain 
operations. I intentionally haven't forced the use of this new 
KafkaZkClient all over the code and instead for now have just used it in 
the KafkaServer.

Does this patch look like something worth using?

-Jaikiran

On Thursday 29 January 2015 10:41 PM, Neha Narkhede wrote:
> Ewen is right. ZkClient APIs are blocking and the right fix for this seems
> to be patching ZkClient. At some point, if we find ourselves fiddling too
> much with ZkClient, it wouldn't hurt to write our own little zookeeper
> client wrapper.
>
> On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava <ew...@confluent.io>
> wrote:
>
>> Looks like a bug to me -- the underlying ZK library wraps a lot of blocking
>> method implementations with waitUntilConnected() calls without any
>> timeouts. Ideally we could just add a version of ZkUtils.getController()
>> with a timeout, but I don't see an easy way to accomplish that with
>> ZkClient.
>>
>> There's at least one other call to ZkUtils besides the one in the
>> stacktrace you gave that would cause the same issue, possibly more that
>> aren't directly called in that method. One ugly solution would be to use an
>> extra thread during shutdown to trigger timeouts, but I'd imagine we
>> probably have other threads that could end up blocking in similar ways.
>>
>> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track the
>> issue.
>>
>>
>> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <ja...@gmail.com>
>> wrote:
>>
>>> The main culprit is this thread which goes into "forever retry connection
>>> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) after
>>> zookeeper has already been shutdown. I have attached the complete thread
>>> dump, but I don't know if it will be delivered to the mailing list.
>>>
>>> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
>>> [0x6ad69000]
>>>     java.lang.Thread.State: TIMED_WAITING (parking)
>>>      at sun.misc.Unsafe.park(Native Method)
>>>      - parking to wait for  <0x70a93368> (a java.util.concurrent.locks.
>>> AbstractQueuedSynchronizer$ConditionObject)
>>>      at java.util.concurrent.locks.LockSupport.parkUntil(
>>> LockSupport.java:267)
>>>      at java.util.concurrent.locks.AbstractQueuedSynchronizer$
>>> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>>>      at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>>>      at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>>>      at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>>>      at
>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>>>      at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>      at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>      at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>>>      at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>>>      at kafka.server.KafkaServer.kafka$server$KafkaServer$$
>>> controlledShutdown(KafkaServer.scala:194)
>>>      at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
>>> sp(KafkaServer.scala:269)
>>>      at kafka.utils.Utils$.swallow(Utils.scala:172)
>>>      at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>>>      at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>>>      at kafka.utils.Logging$class.swallow(Logging.scala:94)
>>>      at kafka.utils.Utils$.swallow(Utils.scala:45)
>>>      at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>>>      at kafka.server.KafkaServerStartable.shutdown(
>>> KafkaServerStartable.scala:42)
>>>      at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>>>
>>> -Jaikiran
>>>
>>>
>>> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>>>
>>>> For a clean shutdown, the broker tries to talk to the controller and
>> also
>>>> issues reads to zookeeper. Possibly that is where it tries to reconnect
>> to
>>>> zk. It will help to look at the thread dump.
>>>>
>>>> Thanks
>>>> Neha
>>>>
>>>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <jai.forums2013@gmail.com
>>>> wrote:
>>>>
>>>>   I was just playing around with the RC2 of 0.8.2 and noticed that if I
>>>>> shutdown zookeeper first I can't shutdown Kafka server at all since it
>>>>> goes
>>>>> into a never ending attempt to reconnect with zookeeper. I had to kill
>>>>> the
>>>>> Kafka process to stop it. I tried it against trunk too and there too I
>>>>> see
>>>>> the same issue. Should I file a JIRA for this and see if I can come up
>>>>> with
>>>>> a patch?
>>>>>
>>>>> FWIW, here's the unending (and IMO too frequent) attempts at trying to
>>>>> reconnect. I've a thread dump too which shows that the other thread
>> which
>>>>> is trying to complete a controlled shutdown of Kafka is blocked forever
>>>>> for
>>>>> the zookeeper to be up. I can attach it to the JIRA.
>>>>>
>>>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server
>> null,
>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>> java.net.ConnectException: Connection refused
>>>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>       at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>> SocketChannelImpl.java:739)
>>>>>       at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>> ClientCnxnSocketNIO.java:361)
>>>>>       at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>> ClientCnxn.java:1081)
>>>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to server
>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server
>> null,
>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>> java.net.ConnectException: Connection refused
>>>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>       at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>> SocketChannelImpl.java:739)
>>>>>       at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>> ClientCnxnSocketNIO.java:361)
>>>>>       at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>> ClientCnxn.java:1081)
>>>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to server
>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server
>> null,
>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>> java.net.ConnectException: Connection refused
>>>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>       at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>> SocketChannelImpl.java:739)
>>>>>       at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>> ClientCnxnSocketNIO.java:361)
>>>>>       at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>> ClientCnxn.java:1081)
>>>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to server
>>>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server
>> null,
>>>>> unexpected error, closing socket connection and attempting reconnect
>>>>> (org.apache.zookeeper.ClientCnxn)
>>>>> java.net.ConnectException: Connection refused
>>>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>>       at sun.nio.ch.SocketChannelImpl.finishConnect(
>>>>> SocketChannelImpl.java:739)
>>>>>       at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>>>> ClientCnxnSocketNIO.java:361)
>>>>>       at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>>>> ClientCnxn.java:1081)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -Jaikiran
>>>>>
>>>>>
>>>>
>>
>> --
>> Thanks,
>> Ewen
>>
>
>


Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Neha Narkhede <ne...@confluent.io>.
Ewen is right. ZkClient APIs are blocking and the right fix for this seems
to be patching ZkClient. At some point, if we find ourselves fiddling too
much with ZkClient, it wouldn't hurt to write our own little zookeeper
client wrapper.

On Thu, Jan 29, 2015 at 12:57 AM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> Looks like a bug to me -- the underlying ZK library wraps a lot of blocking
> method implementations with waitUntilConnected() calls without any
> timeouts. Ideally we could just add a version of ZkUtils.getController()
> with a timeout, but I don't see an easy way to accomplish that with
> ZkClient.
>
> There's at least one other call to ZkUtils besides the one in the
> stacktrace you gave that would cause the same issue, possibly more that
> aren't directly called in that method. One ugly solution would be to use an
> extra thread during shutdown to trigger timeouts, but I'd imagine we
> probably have other threads that could end up blocking in similar ways.
>
> I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track the
> issue.
>
>
> On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <ja...@gmail.com>
> wrote:
>
> > The main culprit is this thread which goes into "forever retry connection
> > to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) after
> > zookeeper has already been shutdown. I have attached the complete thread
> > dump, but I don't know if it will be delivered to the mailing list.
> >
> > "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
> > [0x6ad69000]
> >    java.lang.Thread.State: TIMED_WAITING (parking)
> >     at sun.misc.Unsafe.park(Native Method)
> >     - parking to wait for  <0x70a93368> (a java.util.concurrent.locks.
> > AbstractQueuedSynchronizer$ConditionObject)
> >     at java.util.concurrent.locks.LockSupport.parkUntil(
> > LockSupport.java:267)
> >     at java.util.concurrent.locks.AbstractQueuedSynchronizer$
> > ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
> >     at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
> >     at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
> >     at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
> >     at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
> >     at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> >     at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> >     at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
> >     at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
> >     at kafka.server.KafkaServer.kafka$server$KafkaServer$$
> > controlledShutdown(KafkaServer.scala:194)
> >     at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
> > sp(KafkaServer.scala:269)
> >     at kafka.utils.Utils$.swallow(Utils.scala:172)
> >     at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
> >     at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
> >     at kafka.utils.Logging$class.swallow(Logging.scala:94)
> >     at kafka.utils.Utils$.swallow(Utils.scala:45)
> >     at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
> >     at kafka.server.KafkaServerStartable.shutdown(
> > KafkaServerStartable.scala:42)
> >     at kafka.Kafka$$anon$1.run(Kafka.scala:42)
> >
> > -Jaikiran
> >
> >
> > On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
> >
> >> For a clean shutdown, the broker tries to talk to the controller and
> also
> >> issues reads to zookeeper. Possibly that is where it tries to reconnect
> to
> >> zk. It will help to look at the thread dump.
> >>
> >> Thanks
> >> Neha
> >>
> >> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <jai.forums2013@gmail.com
> >
> >> wrote:
> >>
> >>  I was just playing around with the RC2 of 0.8.2 and noticed that if I
> >>> shutdown zookeeper first I can't shutdown Kafka server at all since it
> >>> goes
> >>> into a never ending attempt to reconnect with zookeeper. I had to kill
> >>> the
> >>> Kafka process to stop it. I tried it against trunk too and there too I
> >>> see
> >>> the same issue. Should I file a JIRA for this and see if I can come up
> >>> with
> >>> a patch?
> >>>
> >>> FWIW, here's the unending (and IMO too frequent) attempts at trying to
> >>> reconnect. I've a thread dump too which shows that the other thread
> which
> >>> is trying to complete a controlled shutdown of Kafka is blocked forever
> >>> for
> >>> the zookeeper to be up. I can attach it to the JIRA.
> >>>
> >>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server
> null,
> >>> unexpected error, closing socket connection and attempting reconnect
> >>> (org.apache.zookeeper.ClientCnxn)
> >>> java.net.ConnectException: Connection refused
> >>>      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >>>      at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>> SocketChannelImpl.java:739)
> >>>      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> >>> ClientCnxnSocketNIO.java:361)
> >>>      at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>> ClientCnxn.java:1081)
> >>> [2015-01-24 10:15:47,437] INFO Opening socket connection to server
> >>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
> >>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server
> null,
> >>> unexpected error, closing socket connection and attempting reconnect
> >>> (org.apache.zookeeper.ClientCnxn)
> >>> java.net.ConnectException: Connection refused
> >>>      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >>>      at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>> SocketChannelImpl.java:739)
> >>>      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> >>> ClientCnxnSocketNIO.java:361)
> >>>      at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>> ClientCnxn.java:1081)
> >>> [2015-01-24 10:15:49,056] INFO Opening socket connection to server
> >>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
> >>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server
> null,
> >>> unexpected error, closing socket connection and attempting reconnect
> >>> (org.apache.zookeeper.ClientCnxn)
> >>> java.net.ConnectException: Connection refused
> >>>      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >>>      at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>> SocketChannelImpl.java:739)
> >>>      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> >>> ClientCnxnSocketNIO.java:361)
> >>>      at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>> ClientCnxn.java:1081)
> >>> [2015-01-24 10:15:50,801] INFO Opening socket connection to server
> >>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
> >>> (unknown error) (org.apache.zookeeper.ClientCnxn)
> >>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server
> null,
> >>> unexpected error, closing socket connection and attempting reconnect
> >>> (org.apache.zookeeper.ClientCnxn)
> >>> java.net.ConnectException: Connection refused
> >>>      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >>>      at sun.nio.ch.SocketChannelImpl.finishConnect(
> >>> SocketChannelImpl.java:739)
> >>>      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> >>> ClientCnxnSocketNIO.java:361)
> >>>      at org.apache.zookeeper.ClientCnxn$SendThread.run(
> >>> ClientCnxn.java:1081)
> >>>
> >>>
> >>>
> >>>
> >>> -Jaikiran
> >>>
> >>>
> >>
> >>
> >
>
>
> --
> Thanks,
> Ewen
>



-- 
Thanks,
Neha

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
Looks like a bug to me -- the underlying ZK library wraps a lot of blocking
method implementations with waitUntilConnected() calls without any
timeouts. Ideally we could just add a version of ZkUtils.getController()
with a timeout, but I don't see an easy way to accomplish that with
ZkClient.

There's at least one other call to ZkUtils besides the one in the
stacktrace you gave that would cause the same issue, possibly more that
aren't directly called in that method. One ugly solution would be to use an
extra thread during shutdown to trigger timeouts, but I'd imagine we
probably have other threads that could end up blocking in similar ways.

I filed https://issues.apache.org/jira/browse/KAFKA-1907 to track the issue.


On Mon, Jan 26, 2015 at 6:35 AM, Jaikiran Pai <ja...@gmail.com>
wrote:

> The main culprit is this thread which goes into "forever retry connection
> to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) after
> zookeeper has already been shutdown. I have attached the complete thread
> dump, but I don't know if it will be delivered to the mailing list.
>
> "Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition
> [0x6ad69000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>     at sun.misc.Unsafe.park(Native Method)
>     - parking to wait for  <0x70a93368> (a java.util.concurrent.locks.
> AbstractQueuedSynchronizer$ConditionObject)
>     at java.util.concurrent.locks.LockSupport.parkUntil(
> LockSupport.java:267)
>     at java.util.concurrent.locks.AbstractQueuedSynchronizer$
> ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
>     at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>     at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>     at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>     at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>     at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>     at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>     at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
>     at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
>     at kafka.server.KafkaServer.kafka$server$KafkaServer$$
> controlledShutdown(KafkaServer.scala:194)
>     at kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$
> sp(KafkaServer.scala:269)
>     at kafka.utils.Utils$.swallow(Utils.scala:172)
>     at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>     at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>     at kafka.utils.Logging$class.swallow(Logging.scala:94)
>     at kafka.utils.Utils$.swallow(Utils.scala:45)
>     at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
>     at kafka.server.KafkaServerStartable.shutdown(
> KafkaServerStartable.scala:42)
>     at kafka.Kafka$$anon$1.run(Kafka.scala:42)
>
> -Jaikiran
>
>
> On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
>
>> For a clean shutdown, the broker tries to talk to the controller and also
>> issues reads to zookeeper. Possibly that is where it tries to reconnect to
>> zk. It will help to look at the thread dump.
>>
>> Thanks
>> Neha
>>
>> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <ja...@gmail.com>
>> wrote:
>>
>>  I was just playing around with the RC2 of 0.8.2 and noticed that if I
>>> shutdown zookeeper first I can't shutdown Kafka server at all since it
>>> goes
>>> into a never ending attempt to reconnect with zookeeper. I had to kill
>>> the
>>> Kafka process to stop it. I tried it against trunk too and there too I
>>> see
>>> the same issue. Should I file a JIRA for this and see if I can come up
>>> with
>>> a patch?
>>>
>>> FWIW, here's the unending (and IMO too frequent) attempts at trying to
>>> reconnect. I've a thread dump too which shows that the other thread which
>>> is trying to complete a controlled shutdown of Kafka is blocked forever
>>> for
>>> the zookeeper to be up. I can attach it to the JIRA.
>>>
>>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server null,
>>> unexpected error, closing socket connection and attempting reconnect
>>> (org.apache.zookeeper.ClientCnxn)
>>> java.net.ConnectException: Connection refused
>>>      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>      at sun.nio.ch.SocketChannelImpl.finishConnect(
>>> SocketChannelImpl.java:739)
>>>      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>> ClientCnxnSocketNIO.java:361)
>>>      at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>> ClientCnxn.java:1081)
>>> [2015-01-24 10:15:47,437] INFO Opening socket connection to server
>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server null,
>>> unexpected error, closing socket connection and attempting reconnect
>>> (org.apache.zookeeper.ClientCnxn)
>>> java.net.ConnectException: Connection refused
>>>      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>      at sun.nio.ch.SocketChannelImpl.finishConnect(
>>> SocketChannelImpl.java:739)
>>>      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>> ClientCnxnSocketNIO.java:361)
>>>      at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>> ClientCnxn.java:1081)
>>> [2015-01-24 10:15:49,056] INFO Opening socket connection to server
>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server null,
>>> unexpected error, closing socket connection and attempting reconnect
>>> (org.apache.zookeeper.ClientCnxn)
>>> java.net.ConnectException: Connection refused
>>>      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>      at sun.nio.ch.SocketChannelImpl.finishConnect(
>>> SocketChannelImpl.java:739)
>>>      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>> ClientCnxnSocketNIO.java:361)
>>>      at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>> ClientCnxn.java:1081)
>>> [2015-01-24 10:15:50,801] INFO Opening socket connection to server
>>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server null,
>>> unexpected error, closing socket connection and attempting reconnect
>>> (org.apache.zookeeper.ClientCnxn)
>>> java.net.ConnectException: Connection refused
>>>      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>      at sun.nio.ch.SocketChannelImpl.finishConnect(
>>> SocketChannelImpl.java:739)
>>>      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>>> ClientCnxnSocketNIO.java:361)
>>>      at org.apache.zookeeper.ClientCnxn$SendThread.run(
>>> ClientCnxn.java:1081)
>>>
>>>
>>>
>>>
>>> -Jaikiran
>>>
>>>
>>
>>
>


-- 
Thanks,
Ewen

Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Jaikiran Pai <ja...@gmail.com>.
The main culprit is this thread which goes into "forever retry 
connection to a closed zookeeper" when I shutdown Kafka (via a Ctrl + C) 
after zookeeper has already been shutdown. I have attached the complete 
thread dump, but I don't know if it will be delivered to the mailing list.

"Thread-2" prio=10 tid=0xb3305000 nid=0x4758 waiting on condition 
[0x6ad69000]
    java.lang.Thread.State: TIMED_WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for  <0x70a93368> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at 
java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:267)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
     at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
     at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
     at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
     at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
     at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
     at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
     at kafka.utils.ZkUtils$.readDataMaybeNull(ZkUtils.scala:456)
     at kafka.utils.ZkUtils$.getController(ZkUtils.scala:65)
     at 
kafka.server.KafkaServer.kafka$server$KafkaServer$$controlledShutdown(KafkaServer.scala:194)
     at 
kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$sp(KafkaServer.scala:269)
     at kafka.utils.Utils$.swallow(Utils.scala:172)
     at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
     at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
     at kafka.utils.Logging$class.swallow(Logging.scala:94)
     at kafka.utils.Utils$.swallow(Utils.scala:45)
     at kafka.server.KafkaServer.shutdown(KafkaServer.scala:269)
     at 
kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:42)
     at kafka.Kafka$$anon$1.run(Kafka.scala:42)

-Jaikiran

On Monday 26 January 2015 05:46 AM, Neha Narkhede wrote:
> For a clean shutdown, the broker tries to talk to the controller and also
> issues reads to zookeeper. Possibly that is where it tries to reconnect to
> zk. It will help to look at the thread dump.
>
> Thanks
> Neha
>
> On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <ja...@gmail.com>
> wrote:
>
>> I was just playing around with the RC2 of 0.8.2 and noticed that if I
>> shutdown zookeeper first I can't shutdown Kafka server at all since it goes
>> into a never ending attempt to reconnect with zookeeper. I had to kill the
>> Kafka process to stop it. I tried it against trunk too and there too I see
>> the same issue. Should I file a JIRA for this and see if I can come up with
>> a patch?
>>
>> FWIW, here's the unending (and IMO too frequent) attempts at trying to
>> reconnect. I've a thread dump too which shows that the other thread which
>> is trying to complete a controlled shutdown of Kafka is blocked forever for
>> the zookeeper to be up. I can attach it to the JIRA.
>>
>> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server null,
>> unexpected error, closing socket connection and attempting reconnect
>> (org.apache.zookeeper.ClientCnxn)
>> java.net.ConnectException: Connection refused
>>      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>      at sun.nio.ch.SocketChannelImpl.finishConnect(
>> SocketChannelImpl.java:739)
>>      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>> ClientCnxnSocketNIO.java:361)
>>      at org.apache.zookeeper.ClientCnxn$SendThread.run(
>> ClientCnxn.java:1081)
>> [2015-01-24 10:15:47,437] INFO Opening socket connection to server
>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server null,
>> unexpected error, closing socket connection and attempting reconnect
>> (org.apache.zookeeper.ClientCnxn)
>> java.net.ConnectException: Connection refused
>>      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>      at sun.nio.ch.SocketChannelImpl.finishConnect(
>> SocketChannelImpl.java:739)
>>      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>> ClientCnxnSocketNIO.java:361)
>>      at org.apache.zookeeper.ClientCnxn$SendThread.run(
>> ClientCnxn.java:1081)
>> [2015-01-24 10:15:49,056] INFO Opening socket connection to server
>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server null,
>> unexpected error, closing socket connection and attempting reconnect
>> (org.apache.zookeeper.ClientCnxn)
>> java.net.ConnectException: Connection refused
>>      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>      at sun.nio.ch.SocketChannelImpl.finishConnect(
>> SocketChannelImpl.java:739)
>>      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>> ClientCnxnSocketNIO.java:361)
>>      at org.apache.zookeeper.ClientCnxn$SendThread.run(
>> ClientCnxn.java:1081)
>> [2015-01-24 10:15:50,801] INFO Opening socket connection to server
>> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
>> (unknown error) (org.apache.zookeeper.ClientCnxn)
>> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server null,
>> unexpected error, closing socket connection and attempting reconnect
>> (org.apache.zookeeper.ClientCnxn)
>> java.net.ConnectException: Connection refused
>>      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>      at sun.nio.ch.SocketChannelImpl.finishConnect(
>> SocketChannelImpl.java:739)
>>      at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
>> ClientCnxnSocketNIO.java:361)
>>      at org.apache.zookeeper.ClientCnxn$SendThread.run(
>> ClientCnxn.java:1081)
>>
>>
>>
>>
>> -Jaikiran
>>
>
>


Re: Cannot stop Kafka server if zookeeper is shutdown first

Posted by Neha Narkhede <ne...@confluent.io>.
For a clean shutdown, the broker tries to talk to the controller and also
issues reads to zookeeper. Possibly that is where it tries to reconnect to
zk. It will help to look at the thread dump.

Thanks
Neha

On Fri, Jan 23, 2015 at 8:53 PM, Jaikiran Pai <ja...@gmail.com>
wrote:

> I was just playing around with the RC2 of 0.8.2 and noticed that if I
> shutdown zookeeper first I can't shutdown Kafka server at all since it goes
> into a never ending attempt to reconnect with zookeeper. I had to kill the
> Kafka process to stop it. I tried it against trunk too and there too I see
> the same issue. Should I file a JIRA for this and see if I can come up with
> a patch?
>
> FWIW, here's the unending (and IMO too frequent) attempts at trying to
> reconnect. I've a thread dump too which shows that the other thread which
> is trying to complete a controlled shutdown of Kafka is blocked forever for
> the zookeeper to be up. I can attach it to the JIRA.
>
> 2015-01-24 10:15:46,278] WARN Session 0x14b1a4136800000 for server null,
> unexpected error, closing socket connection and attempting reconnect
> (org.apache.zookeeper.ClientCnxn)
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at sun.nio.ch.SocketChannelImpl.finishConnect(
> SocketChannelImpl.java:739)
>     at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> ClientCnxnSocketNIO.java:361)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(
> ClientCnxn.java:1081)
> [2015-01-24 10:15:47,437] INFO Opening socket connection to server
> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
> (unknown error) (org.apache.zookeeper.ClientCnxn)
> [2015-01-24 10:15:47,438] WARN Session 0x14b1a4136800000 for server null,
> unexpected error, closing socket connection and attempting reconnect
> (org.apache.zookeeper.ClientCnxn)
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at sun.nio.ch.SocketChannelImpl.finishConnect(
> SocketChannelImpl.java:739)
>     at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> ClientCnxnSocketNIO.java:361)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(
> ClientCnxn.java:1081)
> [2015-01-24 10:15:49,056] INFO Opening socket connection to server
> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
> (unknown error) (org.apache.zookeeper.ClientCnxn)
> [2015-01-24 10:15:49,057] WARN Session 0x14b1a4136800000 for server null,
> unexpected error, closing socket connection and attempting reconnect
> (org.apache.zookeeper.ClientCnxn)
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at sun.nio.ch.SocketChannelImpl.finishConnect(
> SocketChannelImpl.java:739)
>     at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> ClientCnxnSocketNIO.java:361)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(
> ClientCnxn.java:1081)
> [2015-01-24 10:15:50,801] INFO Opening socket connection to server
> localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
> (unknown error) (org.apache.zookeeper.ClientCnxn)
> [2015-01-24 10:15:50,802] WARN Session 0x14b1a4136800000 for server null,
> unexpected error, closing socket connection and attempting reconnect
> (org.apache.zookeeper.ClientCnxn)
> java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at sun.nio.ch.SocketChannelImpl.finishConnect(
> SocketChannelImpl.java:739)
>     at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> ClientCnxnSocketNIO.java:361)
>     at org.apache.zookeeper.ClientCnxn$SendThread.run(
> ClientCnxn.java:1081)
>
>
>
>
> -Jaikiran
>



-- 
Thanks,
Neha