You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Dmitriy Vsekhvalnov <dv...@gmail.com> on 2017/07/03 12:30:09 UTC

Re: kafka-streams app(s) stopped consuming new events

Thanks Damian !

That's was it, after fixing number compaction threads to be higher than 1,
it finally continue to consume stream.

On Fri, Jun 30, 2017 at 7:48 PM, Dmitriy Vsekhvalnov <dvsekhvalnov@gmail.com
> wrote:

> Yeah, can confirm there is only 1 vCPU.
>
> Okay, will try that configuration and get back to you guys.
>
> Thanks so far !
>
> On Fri, Jun 30, 2017 at 7:35 PM, Damian Guy <da...@gmail.com> wrote:
>
>> I'm wondering if it is related to this:
>> http://docs.confluent.io/current/streams/faq.html#rocksdb-
>> behavior-in-1-core-environments
>> It seems like the same issue.
>>
>> Thanks,
>> Damian
>>
>> On Fri, 30 Jun 2017 at 17:16 Dmitriy Vsekhvalnov <dv...@gmail.com>
>> wrote:
>>
>> > Yes, StreamThread-1 #93 daemon is still at at org.rocksdb.RocksDB.put.
>> >
>> > No, AWS machine.
>> >
>> > kafka-streams v0.10.2.1
>> >
>> > May be some option for RockDB that can unlock it? Also i can try to run
>> app
>> > locally against same env to see if it make difference (though it will be
>> > different OS).
>> >
>> > On Fri, Jun 30, 2017 at 6:33 PM, Damian Guy <da...@gmail.com>
>> wrote:
>> >
>> > > Yep, if you take another thread dump is it in the same spot?
>> > > Which version of streams are you running? Are you using docker?
>> > >
>> > > Thanks,
>> > > Damian
>> > >
>> > > On Fri, 30 Jun 2017 at 16:22 Dmitriy Vsekhvalnov <
>> dvsekhvalnov@gmail.com
>> > >
>> > > wrote:
>> > >
>> > > > Is this one looking suspicious?
>> > > >
>> > > >
>> > > > https://gist.github.com/dvsekhvalnov/146ba41c8e78316941098997c9d2f1
>> > > 8a#file-thread-dump
>> > > >
>> > > > On Fri, Jun 30, 2017 at 6:10 PM, Damian Guy <da...@gmail.com>
>> > > wrote:
>> > > >
>> > > > > It looks like Task [4_2] is stuck restoring state, though it
>> doesn't
>> > > look
>> > > > > like there is much state to restore.
>> > > > > It might be helpful if you take some thread dumps to see where it
>> is
>> > > > > blocked.
>> > > > >
>> > > > > Thanks,
>> > > > > Damian
>> > > > >
>> > > > > On Fri, 30 Jun 2017 at 16:04 Dmitriy Vsekhvalnov <
>> > > dvsekhvalnov@gmail.com
>> > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Set  org.apache.kafka.streams to DEBUG.
>> > > > > >
>> > > > > > Here is gist:
>> > > > > >
>> > > > > >
>> > https://gist.github.com/dvsekhvalnov/b84b72349837f6c6394f1adfe18cdb
>> > > > > 61#file-debug-logs
>> > > > > >
>> > > > > > On Fri, Jun 30, 2017 at 12:37 PM, Dmitriy Vsekhvalnov <
>> > > > > > dvsekhvalnov@gmail.com> wrote:
>> > > > > >
>> > > > > > > Sure, how to enable debug logs? Just adjust logback logger?
>> > > > > > >
>> > > > > > >
>> > > > > > > On Fri, Jun 30, 2017 at 11:24 Eno Thereska <
>> > eno.thereska@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > >
>> > > > > > >> It’s hard to tell, the logs do not contain much, I agree. It
>> > could
>> > > > be
>> > > > > a
>> > > > > > >> number of things.
>> > > > > > >>
>> > > > > > >> If it’s happening as you say on restart as well (so it’s
>> > > > > reproducible),
>> > > > > > >> any chance you could start streaming with DEBUG logs on and
>> > > collect
>> > > > > > those
>> > > > > > >> logs? I’m hoping something shows up there.
>> > > > > > >>
>> > > > > > >> Thanks,
>> > > > > > >> Eno
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> > On Jun 28, 2017, at 5:30 PM, Dmitriy Vsekhvalnov <
>> > > > > > >> dvsekhvalnov@gmail.com> wrote:
>> > > > > > >> >
>> > > > > > >> > Nothing for stat-change.log for giving time window. Last
>> line
>> > > > > logged 4
>> > > > > > >> > hours before app stopped.
>> > > > > > >> >
>> > > > > > >> > Any ideas so far? Personally i don't see anything relevant
>> in
>> > > > logs.
>> > > > > > >> >
>> > > > > > >> > On Wed, Jun 28, 2017 at 6:33 PM, Bill Bejeck <
>> > bill@confluent.io
>> > > >
>> > > > > > wrote:
>> > > > > > >> >
>> > > > > > >> >> Sure, couldn't hurt.
>> > > > > > >> >>
>> > > > > > >> >> Thanks,
>> > > > > > >> >> Bill
>> > > > > > >> >>
>> > > > > > >> >> On Wed, Jun 28, 2017 at 9:51 AM, Dmitriy Vsekhvalnov <
>> > > > > > >> >> dvsekhvalnov@gmail.com
>> > > > > > >> >>> wrote:
>> > > > > > >> >>
>> > > > > > >> >>> Here are logs:
>> > > > > > >> >>>
>> > > > > > >> >>> app:
>> > > > > > >> >>> https://gist.github.com/dvsekhvalnov/
>> > > > > f98afc3463f0c63b1722417e3710a8
>> > > > > > >> >>> e7#file-kafka-streams-log
>> > > > > > >> >>> brokers:
>> > > > > > >> >>> https://gist.github.com/dvsekhvalnov/
>> > > > > 8e870f7347394e8d004c282880ef38
>> > > > > > >> >>> 5a#file-kafka-broker-1-2-3-log
>> > > > > > >> >>>
>> > > > > > >> >>> All broker logs are same, so single gist.
>> > > > > > >> >>>
>> > > > > > >> >>> There are also state-change.log files, do you want to
>> take a
>> > > > look
>> > > > > at
>> > > > > > >> >> those
>> > > > > > >> >>> as well?
>> > > > > > >> >>>
>> > > > > > >> >>> On Wed, Jun 28, 2017 at 4:31 PM, Bill Bejeck <
>> > > bill@confluent.io
>> > > > >
>> > > > > > >> wrote:
>> > > > > > >> >>>
>> > > > > > >> >>>> Hi Dmitry,
>> > > > > > >> >>>>
>> > > > > > >> >>>> At the moment I don't have anything specific to look
>> for,
>> > > just
>> > > > > > trying
>> > > > > > >> >> to
>> > > > > > >> >>>> get more context around the issue.
>> > > > > > >> >>>>
>> > > > > > >> >>>> As for the logs maybe broker and streams logs for the
>> last
>> > 30
>> > > > > > minutes
>> > > > > > >> >> up
>> > > > > > >> >>> to
>> > > > > > >> >>>> the time the application stopped processing records.
>> > > > > > >> >>>>
>> > > > > > >> >>>> Thanks,
>> > > > > > >> >>>> Bill
>> > > > > > >> >>>>
>> > > > > > >> >>>> On Wed, Jun 28, 2017 at 9:04 AM, Dmitriy Vsekhvalnov <
>> > > > > > >> >>>> dvsekhvalnov@gmail.com
>> > > > > > >> >>>>> wrote:
>> > > > > > >> >>>>
>> > > > > > >> >>>>> Hi Bill,
>> > > > > > >> >>>>>
>> > > > > > >> >>>>> 1. sure, can extract some logs, what exactly to look
>> for?
>> > > > There
>> > > > > > are
>> > > > > > >> >> 11
>> > > > > > >> >>>>> hours of logs and most of them looks like:
>> > > > > > >> >>>>>
>> > > > > > >> >>>>> [2017-06-27 03:30:50,553] [] [INFO ] [StreamThread-1]
>> > > > > > >> >>>>>
>> > [org.apache.kafka.streams.processor.internals.StreamThread]
>> > > > > > >> >>>> [stream-thread
>> > > > > > >> >>>>> [StreamThread-1] Committing all tasks because the
>> commit
>> > > > > interval
>> > > > > > >> >>> 5000ms
>> > > > > > >> >>>>> has elapsed]
>> > > > > > >> >>>>>
>> > > > > > >> >>>>> [2017-06-27 03:30:50,553] [] [INFO ] [StreamThread-1]
>> > > > > > >> >>>>>
>> > [org.apache.kafka.streams.processor.internals.StreamThread]
>> > > > > > >> >>>> [stream-thread
>> > > > > > >> >>>>> [StreamThread-1] Committing task StreamTask 0_0]
>> > > > > > >> >>>>>
>> > > > > > >> >>>>> [2017-06-27 03:30:50,554] [] [INFO ] [StreamThread-1]
>> > > > > > >> >>>>>
>> > [org.apache.kafka.streams.processor.internals.StreamThread]
>> > > > > > >> >>>> [stream-thread
>> > > > > > >> >>>>> [StreamThread-1] Committing task StreamTask 2_0]
>> > > > > > >> >>>>>
>> > > > > > >> >>>>> Something specific to search for?
>> > > > > > >> >>>>>
>> > > > > > >> >>>>> 2. Yes, there are more messages coming to topic.
>> > > > > > >> >>>>>
>> > > > > > >> >>>>> On Wed, Jun 28, 2017 at 3:43 PM, Bill Bejeck <
>> > > > bill@confluent.io
>> > > > > >
>> > > > > > >> >>> wrote:
>> > > > > > >> >>>>>
>> > > > > > >> >>>>>> Hi Dimitry,
>> > > > > > >> >>>>>>
>> > > > > > >> >>>>>> I'm happy to help, but I could use more information.
>> Can
>> > > you
>> > > > > > share
>> > > > > > >> >>> the
>> > > > > > >> >>>>>> streams logs and broker logs?
>> > > > > > >> >>>>>>
>> > > > > > >> >>>>>> Have you confirmed messages are still being delivered
>> to
>> > > > topics
>> > > > > > >> >> (via
>> > > > > > >> >>>>>> console consumer)?
>> > > > > > >> >>>>>>
>> > > > > > >> >>>>>> Thanks,
>> > > > > > >> >>>>>> Bill
>> > > > > > >> >>>>>>
>> > > > > > >> >>>>>> On Wed, Jun 28, 2017 at 8:24 AM, Dmitriy Vsekhvalnov <
>> > > > > > >> >>>>>> dvsekhvalnov@gmail.com
>> > > > > > >> >>>>>>> wrote:
>> > > > > > >> >>>>>>
>> > > > > > >> >>>>>>> Hi all,
>> > > > > > >> >>>>>>>
>> > > > > > >> >>>>>>> looking for some assistance in debugging
>> kafka-streams
>> > > > > > >> >> application.
>> > > > > > >> >>>>>>>
>> > > > > > >> >>>>>>> Kafka broker  0.10.2.1  - x3 Node cluster
>> > > > > > >> >>>>>>> kafka-streams 0.10.2.1 -  x2 application nodes x 1
>> > stream
>> > > > > thread
>> > > > > > >> >>>> each.
>> > > > > > >> >>>>>>>
>> > > > > > >> >>>>>>> In streams configuration only:
>> > > > > > >> >>>>>>> - SSL transport
>> > > > > > >> >>>>>>> - kafka.streams.commitIntervalMs set to 5000
>> (instead of
>> > > > > default
>> > > > > > >> >>>> 30s).
>> > > > > > >> >>>>>>>
>> > > > > > >> >>>>>>> We running simple aggregation app with several
>> grouping
>> > > > > streams.
>> > > > > > >> >>>>> Running
>> > > > > > >> >>>>>> 2
>> > > > > > >> >>>>>>> instances of an app for redundancy. Both instances
>> were
>> > > > > working
>> > > > > > >> >>>> pretty
>> > > > > > >> >>>>>> fine
>> > > > > > >> >>>>>>> for 11 hours 15 minutes then stopped consuming new
>> > events
>> > > > from
>> > > > > > >> >>> topic.
>> > > > > > >> >>>>>>>
>> > > > > > >> >>>>>>> Hosting JVM processes were working fine, just streams
>> > > > stopped
>> > > > > > >> >>>> reacting
>> > > > > > >> >>>>> to
>> > > > > > >> >>>>>>> new data. No exceptions, errors, e.t.c. in logs.
>> After
>> > > > > restart
>> > > > > > >> >>>> streams
>> > > > > > >> >>>>>>> still not consuming new messages.
>> > > > > > >> >>>>>>>
>> > > > > > >> >>>>>>> Below is 2 last entries from kafka-streams logs from
>> > both
>> > > > > hosts:
>> > > > > > >> >>>>>>>
>> > > > > > >> >>>>>>> [2017-06-27 14:45:09,663] [] [INFO ] [StreamThread-1]
>> > > > > > >> >>>>>>> [org.apache.kafka.streams.processor.internals.
>> > > StreamThread]
>> > > > > > >> >>>>>> [stream-thread
>> > > > > > >> >>>>>>> [StreamThread-1] Committing task StreamTask 4_2]
>> > > > > > >> >>>>>>>
>> > > > > > >> >>>>>>> [2017-06-27 14:45:09,723] [] [INFO ] [StreamThread-1]
>> > > > > > >> >>>>>>> [org.apache.kafka.streams.processor.internals.
>> > > StreamThread]
>> > > > > > >> >>>>>> [stream-thread
>> > > > > > >> >>>>>>> [StreamThread-1] Committing task StreamTask 2_1]
>> > > > > > >> >>>>>>>
>> > > > > > >> >>>>>>> Pretty puzzling why they stopped exactly same moment
>> > (with
>> > > > > > >> >> respect
>> > > > > > >> >>> to
>> > > > > > >> >>>>>>> millis).
>> > > > > > >> >>>>>>>
>> > > > > > >> >>>>>>> Really appreciate any ideas where to dig to.
>> > > > > > >> >>>>>>>
>> > > > > > >> >>>>>>> Thank you.
>> > > > > > >> >>>>>>>
>> > > > > > >> >>>>>>
>> > > > > > >> >>>>>
>> > > > > > >> >>>>
>> > > > > > >> >>>
>> > > > > > >> >>
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>