You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Pushkar Deole <pd...@gmail.com> on 2023/08/09 11:08:23 UTC

kafka streams consumer group reporting lag even on source topics removed from topology

Hi All,

I have a streams application with 3 instances with application-id set to
applicationV1. The application uses processor API with reading from source
topics, processing the data and writing to destination topic.
Currently it consumes from 6 source topics however we don't need to process
data any more from 2 of those topics so we removed 2 topics from the source
topics list. We have configured Datadog dashboard to report and alert on
consumer lag so after removing the 2 source topics and deploying
application, we started getting several alerts about consumer lag on
applicationV1 consumer group which is underlying consumer group of the
streams application. When we looked at the consumer group from kafka-cli,
we could see that the consumer group is reporting lag against the topics
removed from source topic list which is reflecting as increasing lag on
Datadog monitoring.

Can someone advise if this is expected behavior? In my opinion, this is not
expected since streams application no more has those topics as part of
source, it should not report lag on those.

Re: kafka streams consumer group reporting lag even on source topics removed from topology

Posted by "Matthias J. Sax" <mj...@apache.org>.
Great!

On 9/5/23 1:23 AM, Pushkar Deole wrote:
> I think I could figure out a way. There are certain commands that can be
> executed from kafka-cli to disassociate a consumer group from the topic
> that are not more being consumed.
> With this sort of command, I could delete the consumer offsets for a
> consumer group for a specific topic and that resolved the lag problem:
> 
> kafka-consumer-groups --bootstrap-server $KAFKA_BOOTSTRAP_SERVERS
> --command-config ~/kafka.properties --delete-offsets --group
> "<myOriginalGroup>" --topic "<myUnusedTopic"
> 
> On Tue, Sep 5, 2023 at 7:15 AM Matthias J. Sax <mj...@apache.org> wrote:
> 
>> As long as the consumer group is active, nothing will be deleted. That
>> is the reason why you get those incorrect alerts -- Kafka cannot know
>> that you stopped consuming from those topics. (That is what I tried to
>> explain -- seems I did a bad job...)
>>
>> Changing the group.id is tricky because Kafka Streams uses it to
>> identify internal topic names (for repartiton and chagnelog topics), and
>> thus your app would start with newly created (and thus empty topics). --
>> You might want to restart the app with `auto.offset.reset = "earliest"`
>> and reprocess all available input to re-create state.
>>
>>
>> -Matthias
>>
>> On 8/19/23 8:07 AM, Pushkar Deole wrote:
>>> @matthias
>>>
>>> what are the alternatives to get rid of this issue? When the lag starts
>>> increasing, we have alerts configured on our monitoring system in Datadog
>>> which starts sending alerts and alarms to reliability teams. I know in
>>> kafka the inactive consumer group is cleared up after 7 days however not
>>> sure if that is the case with topics that were consumed previously and
>> not
>>> consumed now.
>>>
>>> Does creation of new consumer group (setting a different application.id)
>> on
>>> streams application an option here?
>>>
>>>
>>> On Thu, Aug 17, 2023 at 7:03 AM Matthias J. Sax <mj...@apache.org>
>> wrote:
>>>
>>>> Well, it's kinda expected behavior. It's a split brain problem.
>>>>
>>>> In the end, you use the same `application.id / group.id` and thus the
>>>> committed offsets for the removed topics are still in
>>>> `__consumer_offsets` topics and associated with the consumer group.
>>>>
>>>> If a tool inspects lags and compares the latest committed offsets to
>>>> end-offsets it looks for everything it finds in the `__consumer_offsets`
>>>> topics for the group in question -- the tool cannot know that you
>>>> changed the application and that is does not read from those topics any
>>>> longer (and thus does not commit any longer).
>>>>
>>>> I am not sure from top of my head if you could do a manual cleanup for
>>>> the `application.id` and topics in question and delete the committed
>>>> offsets from the `__consumer_offsets` topic -- try to checkout `Admin`
>>>> client and/or the command line tools...
>>>>
>>>> In know that it's possible to delete committed offsets for a consumer
>>>> group (if a group becomes inactive, the broker would also cleanup all
>>>> group metadata after a configurable timeout), but I am not sure if
>>>> that's for the entire consumer group (ie, all topic) or if you can do it
>>>> on a per-topic basis, too.
>>>>
>>>>
>>>> HTH,
>>>>      -Matthias
>>>>
>>>>
>>>> On 8/16/23 2:11 AM, Pushkar Deole wrote:
>>>>> Hi streams Dev community  @matthias, @bruno
>>>>>
>>>>> Any inputs on above issue? Is this a bug in the streams library wherein
>>>> the
>>>>> input topic removed from streams processor topology, the underlying
>>>>> consumer group still reporting lag against those?
>>>>>
>>>>> On Wed, Aug 9, 2023 at 4:38 PM Pushkar Deole <pd...@gmail.com>
>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have a streams application with 3 instances with application-id set
>> to
>>>>>> applicationV1. The application uses processor API with reading from
>>>> source
>>>>>> topics, processing the data and writing to destination topic.
>>>>>> Currently it consumes from 6 source topics however we don't need to
>>>>>> process data any more from 2 of those topics so we removed 2 topics
>> from
>>>>>> the source topics list. We have configured Datadog dashboard to report
>>>> and
>>>>>> alert on consumer lag so after removing the 2 source topics and
>>>> deploying
>>>>>> application, we started getting several alerts about consumer lag on
>>>>>> applicationV1 consumer group which is underlying consumer group of the
>>>>>> streams application. When we looked at the consumer group from
>>>> kafka-cli,
>>>>>> we could see that the consumer group is reporting lag against the
>> topics
>>>>>> removed from source topic list which is reflecting as increasing lag
>> on
>>>>>> Datadog monitoring.
>>>>>>
>>>>>> Can someone advise if this is expected behavior? In my opinion, this
>> is
>>>>>> not expected since streams application no more has those topics as
>> part
>>>> of
>>>>>> source, it should not report lag on those.
>>>>>>
>>>>>
>>>>
>>>
>>
> 

Re: kafka streams consumer group reporting lag even on source topics removed from topology

Posted by Pushkar Deole <pd...@gmail.com>.
I think I could figure out a way. There are certain commands that can be
executed from kafka-cli to disassociate a consumer group from the topic
that are not more being consumed.
With this sort of command, I could delete the consumer offsets for a
consumer group for a specific topic and that resolved the lag problem:

kafka-consumer-groups --bootstrap-server $KAFKA_BOOTSTRAP_SERVERS
--command-config ~/kafka.properties --delete-offsets --group
"<myOriginalGroup>" --topic "<myUnusedTopic"

On Tue, Sep 5, 2023 at 7:15 AM Matthias J. Sax <mj...@apache.org> wrote:

> As long as the consumer group is active, nothing will be deleted. That
> is the reason why you get those incorrect alerts -- Kafka cannot know
> that you stopped consuming from those topics. (That is what I tried to
> explain -- seems I did a bad job...)
>
> Changing the group.id is tricky because Kafka Streams uses it to
> identify internal topic names (for repartiton and chagnelog topics), and
> thus your app would start with newly created (and thus empty topics). --
> You might want to restart the app with `auto.offset.reset = "earliest"`
> and reprocess all available input to re-create state.
>
>
> -Matthias
>
> On 8/19/23 8:07 AM, Pushkar Deole wrote:
> > @matthias
> >
> > what are the alternatives to get rid of this issue? When the lag starts
> > increasing, we have alerts configured on our monitoring system in Datadog
> > which starts sending alerts and alarms to reliability teams. I know in
> > kafka the inactive consumer group is cleared up after 7 days however not
> > sure if that is the case with topics that were consumed previously and
> not
> > consumed now.
> >
> > Does creation of new consumer group (setting a different application.id)
> on
> > streams application an option here?
> >
> >
> > On Thu, Aug 17, 2023 at 7:03 AM Matthias J. Sax <mj...@apache.org>
> wrote:
> >
> >> Well, it's kinda expected behavior. It's a split brain problem.
> >>
> >> In the end, you use the same `application.id / group.id` and thus the
> >> committed offsets for the removed topics are still in
> >> `__consumer_offsets` topics and associated with the consumer group.
> >>
> >> If a tool inspects lags and compares the latest committed offsets to
> >> end-offsets it looks for everything it finds in the `__consumer_offsets`
> >> topics for the group in question -- the tool cannot know that you
> >> changed the application and that is does not read from those topics any
> >> longer (and thus does not commit any longer).
> >>
> >> I am not sure from top of my head if you could do a manual cleanup for
> >> the `application.id` and topics in question and delete the committed
> >> offsets from the `__consumer_offsets` topic -- try to checkout `Admin`
> >> client and/or the command line tools...
> >>
> >> In know that it's possible to delete committed offsets for a consumer
> >> group (if a group becomes inactive, the broker would also cleanup all
> >> group metadata after a configurable timeout), but I am not sure if
> >> that's for the entire consumer group (ie, all topic) or if you can do it
> >> on a per-topic basis, too.
> >>
> >>
> >> HTH,
> >>     -Matthias
> >>
> >>
> >> On 8/16/23 2:11 AM, Pushkar Deole wrote:
> >>> Hi streams Dev community  @matthias, @bruno
> >>>
> >>> Any inputs on above issue? Is this a bug in the streams library wherein
> >> the
> >>> input topic removed from streams processor topology, the underlying
> >>> consumer group still reporting lag against those?
> >>>
> >>> On Wed, Aug 9, 2023 at 4:38 PM Pushkar Deole <pd...@gmail.com>
> >> wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>> I have a streams application with 3 instances with application-id set
> to
> >>>> applicationV1. The application uses processor API with reading from
> >> source
> >>>> topics, processing the data and writing to destination topic.
> >>>> Currently it consumes from 6 source topics however we don't need to
> >>>> process data any more from 2 of those topics so we removed 2 topics
> from
> >>>> the source topics list. We have configured Datadog dashboard to report
> >> and
> >>>> alert on consumer lag so after removing the 2 source topics and
> >> deploying
> >>>> application, we started getting several alerts about consumer lag on
> >>>> applicationV1 consumer group which is underlying consumer group of the
> >>>> streams application. When we looked at the consumer group from
> >> kafka-cli,
> >>>> we could see that the consumer group is reporting lag against the
> topics
> >>>> removed from source topic list which is reflecting as increasing lag
> on
> >>>> Datadog monitoring.
> >>>>
> >>>> Can someone advise if this is expected behavior? In my opinion, this
> is
> >>>> not expected since streams application no more has those topics as
> part
> >> of
> >>>> source, it should not report lag on those.
> >>>>
> >>>
> >>
> >
>

Re: kafka streams consumer group reporting lag even on source topics removed from topology

Posted by "Matthias J. Sax" <mj...@apache.org>.
As long as the consumer group is active, nothing will be deleted. That 
is the reason why you get those incorrect alerts -- Kafka cannot know 
that you stopped consuming from those topics. (That is what I tried to 
explain -- seems I did a bad job...)

Changing the group.id is tricky because Kafka Streams uses it to 
identify internal topic names (for repartiton and chagnelog topics), and 
thus your app would start with newly created (and thus empty topics). -- 
You might want to restart the app with `auto.offset.reset = "earliest"` 
and reprocess all available input to re-create state.


-Matthias

On 8/19/23 8:07 AM, Pushkar Deole wrote:
> @matthias
> 
> what are the alternatives to get rid of this issue? When the lag starts
> increasing, we have alerts configured on our monitoring system in Datadog
> which starts sending alerts and alarms to reliability teams. I know in
> kafka the inactive consumer group is cleared up after 7 days however not
> sure if that is the case with topics that were consumed previously and not
> consumed now.
> 
> Does creation of new consumer group (setting a different application.id) on
> streams application an option here?
> 
> 
> On Thu, Aug 17, 2023 at 7:03 AM Matthias J. Sax <mj...@apache.org> wrote:
> 
>> Well, it's kinda expected behavior. It's a split brain problem.
>>
>> In the end, you use the same `application.id / group.id` and thus the
>> committed offsets for the removed topics are still in
>> `__consumer_offsets` topics and associated with the consumer group.
>>
>> If a tool inspects lags and compares the latest committed offsets to
>> end-offsets it looks for everything it finds in the `__consumer_offsets`
>> topics for the group in question -- the tool cannot know that you
>> changed the application and that is does not read from those topics any
>> longer (and thus does not commit any longer).
>>
>> I am not sure from top of my head if you could do a manual cleanup for
>> the `application.id` and topics in question and delete the committed
>> offsets from the `__consumer_offsets` topic -- try to checkout `Admin`
>> client and/or the command line tools...
>>
>> In know that it's possible to delete committed offsets for a consumer
>> group (if a group becomes inactive, the broker would also cleanup all
>> group metadata after a configurable timeout), but I am not sure if
>> that's for the entire consumer group (ie, all topic) or if you can do it
>> on a per-topic basis, too.
>>
>>
>> HTH,
>>     -Matthias
>>
>>
>> On 8/16/23 2:11 AM, Pushkar Deole wrote:
>>> Hi streams Dev community  @matthias, @bruno
>>>
>>> Any inputs on above issue? Is this a bug in the streams library wherein
>> the
>>> input topic removed from streams processor topology, the underlying
>>> consumer group still reporting lag against those?
>>>
>>> On Wed, Aug 9, 2023 at 4:38 PM Pushkar Deole <pd...@gmail.com>
>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I have a streams application with 3 instances with application-id set to
>>>> applicationV1. The application uses processor API with reading from
>> source
>>>> topics, processing the data and writing to destination topic.
>>>> Currently it consumes from 6 source topics however we don't need to
>>>> process data any more from 2 of those topics so we removed 2 topics from
>>>> the source topics list. We have configured Datadog dashboard to report
>> and
>>>> alert on consumer lag so after removing the 2 source topics and
>> deploying
>>>> application, we started getting several alerts about consumer lag on
>>>> applicationV1 consumer group which is underlying consumer group of the
>>>> streams application. When we looked at the consumer group from
>> kafka-cli,
>>>> we could see that the consumer group is reporting lag against the topics
>>>> removed from source topic list which is reflecting as increasing lag on
>>>> Datadog monitoring.
>>>>
>>>> Can someone advise if this is expected behavior? In my opinion, this is
>>>> not expected since streams application no more has those topics as part
>> of
>>>> source, it should not report lag on those.
>>>>
>>>
>>
> 

Re: kafka streams consumer group reporting lag even on source topics removed from topology

Posted by Pushkar Deole <pd...@gmail.com>.
@matthias

what are the alternatives to get rid of this issue? When the lag starts
increasing, we have alerts configured on our monitoring system in Datadog
which starts sending alerts and alarms to reliability teams. I know in
kafka the inactive consumer group is cleared up after 7 days however not
sure if that is the case with topics that were consumed previously and not
consumed now.

Does creation of new consumer group (setting a different application.id) on
streams application an option here?


On Thu, Aug 17, 2023 at 7:03 AM Matthias J. Sax <mj...@apache.org> wrote:

> Well, it's kinda expected behavior. It's a split brain problem.
>
> In the end, you use the same `application.id / group.id` and thus the
> committed offsets for the removed topics are still in
> `__consumer_offsets` topics and associated with the consumer group.
>
> If a tool inspects lags and compares the latest committed offsets to
> end-offsets it looks for everything it finds in the `__consumer_offsets`
> topics for the group in question -- the tool cannot know that you
> changed the application and that is does not read from those topics any
> longer (and thus does not commit any longer).
>
> I am not sure from top of my head if you could do a manual cleanup for
> the `application.id` and topics in question and delete the committed
> offsets from the `__consumer_offsets` topic -- try to checkout `Admin`
> client and/or the command line tools...
>
> In know that it's possible to delete committed offsets for a consumer
> group (if a group becomes inactive, the broker would also cleanup all
> group metadata after a configurable timeout), but I am not sure if
> that's for the entire consumer group (ie, all topic) or if you can do it
> on a per-topic basis, too.
>
>
> HTH,
>    -Matthias
>
>
> On 8/16/23 2:11 AM, Pushkar Deole wrote:
> > Hi streams Dev community  @matthias, @bruno
> >
> > Any inputs on above issue? Is this a bug in the streams library wherein
> the
> > input topic removed from streams processor topology, the underlying
> > consumer group still reporting lag against those?
> >
> > On Wed, Aug 9, 2023 at 4:38 PM Pushkar Deole <pd...@gmail.com>
> wrote:
> >
> >> Hi All,
> >>
> >> I have a streams application with 3 instances with application-id set to
> >> applicationV1. The application uses processor API with reading from
> source
> >> topics, processing the data and writing to destination topic.
> >> Currently it consumes from 6 source topics however we don't need to
> >> process data any more from 2 of those topics so we removed 2 topics from
> >> the source topics list. We have configured Datadog dashboard to report
> and
> >> alert on consumer lag so after removing the 2 source topics and
> deploying
> >> application, we started getting several alerts about consumer lag on
> >> applicationV1 consumer group which is underlying consumer group of the
> >> streams application. When we looked at the consumer group from
> kafka-cli,
> >> we could see that the consumer group is reporting lag against the topics
> >> removed from source topic list which is reflecting as increasing lag on
> >> Datadog monitoring.
> >>
> >> Can someone advise if this is expected behavior? In my opinion, this is
> >> not expected since streams application no more has those topics as part
> of
> >> source, it should not report lag on those.
> >>
> >
>

Re: kafka streams consumer group reporting lag even on source topics removed from topology

Posted by "Matthias J. Sax" <mj...@apache.org>.
Well, it's kinda expected behavior. It's a split brain problem.

In the end, you use the same `application.id / group.id` and thus the 
committed offsets for the removed topics are still in 
`__consumer_offsets` topics and associated with the consumer group.

If a tool inspects lags and compares the latest committed offsets to 
end-offsets it looks for everything it finds in the `__consumer_offsets` 
topics for the group in question -- the tool cannot know that you 
changed the application and that is does not read from those topics any 
longer (and thus does not commit any longer).

I am not sure from top of my head if you could do a manual cleanup for 
the `application.id` and topics in question and delete the committed 
offsets from the `__consumer_offsets` topic -- try to checkout `Admin` 
client and/or the command line tools...

In know that it's possible to delete committed offsets for a consumer 
group (if a group becomes inactive, the broker would also cleanup all 
group metadata after a configurable timeout), but I am not sure if 
that's for the entire consumer group (ie, all topic) or if you can do it 
on a per-topic basis, too.


HTH,
   -Matthias


On 8/16/23 2:11 AM, Pushkar Deole wrote:
> Hi streams Dev community  @matthias, @bruno
> 
> Any inputs on above issue? Is this a bug in the streams library wherein the
> input topic removed from streams processor topology, the underlying
> consumer group still reporting lag against those?
> 
> On Wed, Aug 9, 2023 at 4:38 PM Pushkar Deole <pd...@gmail.com> wrote:
> 
>> Hi All,
>>
>> I have a streams application with 3 instances with application-id set to
>> applicationV1. The application uses processor API with reading from source
>> topics, processing the data and writing to destination topic.
>> Currently it consumes from 6 source topics however we don't need to
>> process data any more from 2 of those topics so we removed 2 topics from
>> the source topics list. We have configured Datadog dashboard to report and
>> alert on consumer lag so after removing the 2 source topics and deploying
>> application, we started getting several alerts about consumer lag on
>> applicationV1 consumer group which is underlying consumer group of the
>> streams application. When we looked at the consumer group from kafka-cli,
>> we could see that the consumer group is reporting lag against the topics
>> removed from source topic list which is reflecting as increasing lag on
>> Datadog monitoring.
>>
>> Can someone advise if this is expected behavior? In my opinion, this is
>> not expected since streams application no more has those topics as part of
>> source, it should not report lag on those.
>>
> 

Re: kafka streams consumer group reporting lag even on source topics removed from topology

Posted by Pushkar Deole <pd...@gmail.com>.
Hi streams Dev community  @matthias, @bruno

Any inputs on above issue? Is this a bug in the streams library wherein the
input topic removed from streams processor topology, the underlying
consumer group still reporting lag against those?

On Wed, Aug 9, 2023 at 4:38 PM Pushkar Deole <pd...@gmail.com> wrote:

> Hi All,
>
> I have a streams application with 3 instances with application-id set to
> applicationV1. The application uses processor API with reading from source
> topics, processing the data and writing to destination topic.
> Currently it consumes from 6 source topics however we don't need to
> process data any more from 2 of those topics so we removed 2 topics from
> the source topics list. We have configured Datadog dashboard to report and
> alert on consumer lag so after removing the 2 source topics and deploying
> application, we started getting several alerts about consumer lag on
> applicationV1 consumer group which is underlying consumer group of the
> streams application. When we looked at the consumer group from kafka-cli,
> we could see that the consumer group is reporting lag against the topics
> removed from source topic list which is reflecting as increasing lag on
> Datadog monitoring.
>
> Can someone advise if this is expected behavior? In my opinion, this is
> not expected since streams application no more has those topics as part of
> source, it should not report lag on those.
>