You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Srikanth <sr...@gmail.com> on 2018/01/29 07:28:38 UTC

Re: Kafka Streams app error while rebalancing

Matthias,

We moved to streams client 1.0.0 with 0.10.2.x broker now. Haven't see this
issue ever since.
Thanks!

Srikanth

On Thu, Dec 7, 2017 at 12:15 AM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Running Streams 1.0.0 should just work with 0.10.2 brokers. Of course,
> you can't use EOS feature.
>
> Not sure how your app got into this bad state. But if is does not
> recover from it, delete the store directory seems to be reasonable --
> also for stateful application, you won't loose data, as we recreate the
> stores from the changelog topic. A bug, does not mean not production
> ready IMHO. Many people run 0.10.2 in production. But newer versions
> should fix bugs of course :)
>
> Please let us know if 1.0.0 works for you.
>
> -Matthias
>
> On 12/6/17 3:18 AM, Srikanth wrote:
> > Thanks Matthias. Please find my response inline.
> >
> > On Wed, Dec 6, 2017 at 12:34 AM, Matthias J. Sax <ma...@confluent.io>
> > wrote:
> >
> >> Hard to say.
> >>
> >> However, deleting state directories will not have any negative impact as
> >> you don't use stores. Thus, why do you not want to do this?
> >> <Sri> We are running this in a prod-like setup. I don't have SSH access
> to
> >> the box.
> >
> > Its not ideal to request deleting these every now and then. It doesn't
> >> sound prod ready.
> >>
> >
> >
> >> Another workaround you can do, it to start four applications with 1
> >> thread each -- this would isolate the instances further and avoid the
> >> lock issue (you would need to run on different host, or on the same
> >> host, but with different state directory for the different instances.)
> >> <Sri> App has a large in-mem cache. I'd like more threads to use this
> >> cache to keep memory/cpu usage in balance.
> >
> >
> >
> >
> >> In general, I would recommend to upgrade you applications -- you don't
> >> need to upgrade the brokers for this. 1.0 versions has many additional
> >> bug fixes. This should resolve the issues you are facing.
> >
> > <Sri> I haven't really tried bidirectional compatibility mode. Is there
> any
> >> caveat I need to be aware of for 1.0.x client and 0.10.2.1 brokers?
> >
> > I probably should try this.
> >
> >
> >
> >> Otherwise, it's hard to say without logs.
> >> <Sri> Unfortunately, that's all I had in the logs.  I know it isn't
> >> helpful. I've kept default logging to WARN except for few packages.
> >
> > I assume any logging that indicates why app isn't making progress would
> be
> >> in WARN. I'll bump the level up for few days.
> >
> >
> >
> >> Hope this helps.
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 12/5/17 9:35 AM, Srikanth wrote:
> >>> Hello,
> >>>
> >>> We noticed that a kafka streams app is stuck in rebalance state with
> >> below
> >>> error.
> >>> Two instance of the app were running fine until a rebalace was
> >>> triggered(possibly due to network issue).
> >>> Both app instance are running(no app restart)
> >>> App itself doesn't create/use state store. NUM_STREAM_THREADS_CONFIG=2
> >>>
> >>> I did see several tickets with similar errors that are marked as fixed.
> >> I'm
> >>> using version 0.10.2.1.
> >>>
> >>> 17/12/04 18:34:57 WARN StreamThread: Could not create task 0_2. Will
> >> retry.
> >>> org.apache.kafka.streams.errors.LockException: task [0_2] Failed to
> lock
> >>> the state directory for task 0_2
> >>>   at
> >>> org.apache.kafka.streams.processor.internals.
> >> ProcessorStateManager.<init>(ProcessorStateManager.java:100)
> >>>   at
> >>> org.apache.kafka.streams.processor.internals.AbstractTask.<init>(
> >> AbstractTask.java:73)
> >>>   at
> >>> org.apache.kafka.streams.processor.internals.
> >> StreamTask.<init>(StreamTask.java:108)
> >>>   at
> >>> org.apache.kafka.streams.processor.internals.
> >> StreamThread.createStreamTask(StreamThread.java:864)
> >>>   at
> >>> org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.
> >> createTask(StreamThread.java:1237)
> >>>   at
> >>> org.apache.kafka.streams.processor.internals.StreamThread$
> >> AbstractTaskCreator.retryWithBackoff(StreamThread.java:1210)
> >>>   at
> >>> org.apache.kafka.streams.processor.internals.
> >> StreamThread.addStreamTasks(StreamThread.java:967)
> >>>   at
> >>> org.apache.kafka.streams.processor.internals.StreamThread.access$600(
> >> StreamThread.java:69)
> >>>   at
> >>> org.apache.kafka.streams.processor.internals.StreamThread$1.
> >> onPartitionsAssigned(StreamThread.java:234)
> >>>   at
> >>> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
> >> onJoinComplete(ConsumerCoordinator.java:259)
> >>>   at
> >>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> >> joinGroupIfNeeded(AbstractCoordinator.java:352)
> >>>   at
> >>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> >> ensureActiveGroup(AbstractCoordinator.java:303)
> >>>   at
> >>> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(
> >> ConsumerCoordinator.java:290)
> >>>   at
> >>> org.apache.kafka.clients.consumer.KafkaConsumer.
> >> pollOnce(KafkaConsumer.java:1029)
> >>>   at
> >>> org.apache.kafka.clients.consumer.KafkaConsumer.poll(
> >> KafkaConsumer.java:995)
> >>>   at
> >>> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
> >> StreamThread.java:592)
> >>>   at
> >>> org.apache.kafka.streams.processor.internals.
> >> StreamThread.run(StreamThread.java:361)
> >>> 17/12/04 18:34:58 WARN StreamThread: Could not create task 0_8. Will
> >> retry.
> >>> org.apache.kafka.streams.errors.LockException: task [0_8] Failed to
> lock
> >>> the state directory for task 0_8
> >>>
> >>>
> >>> kafka-consumer-groups --new-consumer --bootstrap-server <...>
> --describe
> >>> --group GeoTest
> >>> Note: This will only show information about consumers that use the Java
> >>> consumer API (non-ZooKeeper-based consumers).
> >>>
> >>> Warning: Consumer group 'GeoTest' is rebalancing.
> >>>
> >>> I keep seeing the above lock exception continuously and app is not
> making
> >>> any progress. Any idea why it is stuck?
> >>> I read a few suggestions that required me to manually delete state
> >>> directory. I'd like to avoid that.
> >>>
> >>> Thanks,
> >>> Srikanth
> >>>
> >>
> >>
> >
>
>