You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Oliver Eckle <ie...@gmx.de> on 2019/11/07 20:55:16 UTC

Consumer Lags and receive no records anymore

Hi there,



have pretty strange behaviour questioned here already:
https://stackoverflow.com/q/58650416/7776688



As you could see from the logs: https://pastebin.com/yrSytSHD at a specific
point the client is stopping to receive records.

I have a strong suspicion that it relates to performance on handling the
records - so that I run into kind of timeout.

What seems to be strange, is that the client is not getting back and
heartbeats are processed successfully.

Even the consumer will be returned on inspecting the consumer group. Any
idea .. kafka log has no error in it.



Running a cluster with 3 broker inside a Kubernetes cluster, using the
bitnami helm chart.



Kind Regards

Oliver






Re: [External] AW: Consumer Lags and receive no records anymore

Posted by "Tauzell, Dave" <Da...@surescripts.com>.
I belive the behavior has changed over time.  There is a way to explicitly set a practitioner and they provide: https://github.com/axbaretto/kafka/blob/master/clients/src/main/java/org/apache/kafka/clients/producer/RoundRobinPartitioner.java

On 11/10/19, 5:45 AM, "Oliver Eckle" <ie...@gmx.de> wrote:

    Hi Dave,
    
    thank you . saw some tutorial where they told it otherwise .. which confuses me a litte.
    If its done round-robin .. my "world view" makes sense again 😊 
    
    Oliver
    
    
    -----Ursprüngliche Nachricht-----
    Von: Tauzell, Dave <Da...@surescripts.com> 
    Gesendet: Freitag, 8. November 2019 16:18
    An: users@kafka.apache.org
    Betreff: Re: [External] AW: Consumer Lags and receive no records anymore
    
    A null key results in the client sending to partitions in a round-robin order.  Use a key if you want to ensure that specific messages end up on the same partition.
    
    -Dave
    
    On 11/8/19, 1:06 AM, "Oliver Eckle" <ie...@gmx.de> wrote:
    
        Hi,
    
        Don’t get me wrong, I just want to understand what's going on.
        so how do I figure out, how much partitions are required? Trial and Error?
        And as far as I understand, if I have null as key for the record, the record is stored in all partitions.
        Is it then not also processed by each consumer, even if I have more than one consumer?
        So could you explain, why the consumer stops to get data?
    
        Thx
    
        -----Ursprüngliche Nachricht-----
        Von: M. Manna <ma...@gmail.com>
        Gesendet: Freitag, 8. November 2019 00:51
        An: Kafka Users <us...@kafka.apache.org>
        Betreff: Re: Consumer Lags and receive no records anymore
    
        Hi again,
    
        On Thu, 7 Nov 2019 at 23:40, Oliver Eckle <ie...@gmx.de> wrote:
    
        > Hi,
        >
        > slow consumers - that could be the case. But why is that an issue? I
        > mean I try to use kafka exactly for that and the ability to recover.
        > So e.g if there is some burst scenario where a lot of data arrives and
        > has to be processed, a "slow consumer" will be the default case.
        > What I could understand, that having constantly slow consumers will be
        > an issue, e.g. if there is some compaction on the topic or data will
        > be erased, without having been read.
        >
        > This is what I think about the "lagging topic"
        > The scenario is like that:
        >
        > Producer --- Topic C ---> Consumer --- Processing ---> external REST
        > Endpoint
        >
        > Sending a Record to the external REST Endpoint takes around 300ms.
        > So if I have the "burst scenario" I mentioned above, there is maybe a
        > lag of 1000-2000 records.
        > So consumer is pulling 500 and process them, which means it takes
        > around 150s for the app to process the records.
        > This could create some timeouts I guess ... so that’s the reason why I
        > try to lower the poll records to 50 e.g. cause then is takes only 15s
        > until the poll is committed.
        >
        > Yeah having some liveness probe sounds pretty elegant .. give that a
        > try ...
        > Anyway, I need to understand why that is happening to deal with the
        > scenario the correct way.. killing the consumer after he stops to
        > consume messages, seems to me more like a workaround.
        >
        > Regards
        >
        As per your previous replies, if you have 2 partitions with that topic, you can distribute all data between 2 consumers in your cgroup, and process information. But given your data burst case, I would advise you increase your number of partitions and spread the burst across. Just like any other tool, Kafka requires certain level of configuration to achieve what you want. I would recommend you increase your partitions and consumers to spread the load.
    
        Regards,
    
        >
        > -----Ursprüngliche Nachricht-----
        > Von: M. Manna <ma...@gmail.com>
        > Gesendet: Freitag, 8. November 2019 00:24
        > An: users@kafka.apache.org
        > Betreff: Re: Consumer Lags and receive no records anymore
        >
        > Hi,
        >
        > > On 7 Nov 2019, at 22:39, Oliver Eckle <ie...@gmx.de> wrote:
        > >
        > > Have a consumer group with one consumer for the topic .. by
        > misunderstanding I have two partitions on the topic ..
        > > Due to having no key set for the record - I think having several
        > consumers making no sense, or am I wrong.
        > >
        > I am not sure why that would be an issue. If you have 1 consumer your
        > cgroup, yes all the topic partitions will be assigned to that consumer.
        > Slow consumer means your consumers aren’t consuming messages as fast
        > as you are producing (or, fast enough).
        > > Is there any possibility to work around that?
        > > Cause for example on lagging topic is put to a external REST
        > > service,
        > which takes around 300ms to be handled.
        > What do you mean by “Lagging topic is put to an external REST service”?
        > > So is lowering the max.poll.records an option?
        > Polling will keep blocking continuously until minimum bytes of records
        > available. Also, it sends a heartbeat per call of poll().
        > > Anyhow, I could probably not avoid situations like that. Sounds to
        > > me
        > like a pretty common scenario?
        > > So how to deal with them? Having a health check that crush the app
        > > if no
        > data is appearing anymore?
        > In K8s world, you can tie this with liveness probe, if you consumers
        > aren’t live and then you may chose to destroy the pod and bring them
        > back up. Provided that your offset commits are adhering to how
        > technical requirements, you should be able to recover based on the
        > last committed offset. Try that and see how it goes.
        > >
        > > Regards
        > >
        > > -----Ursprüngliche Nachricht-----
        > > Von: M. Manna <ma...@gmail.com>
        > > Gesendet: Donnerstag, 7. November 2019 23:35
        > > An: users@kafka.apache.org
        > > Betreff: Re: Consumer Lags and receive no records anymore
        > >
        > > Consuming not fast/frequent enough is one of the most common reasons
        > > for
        > it. Have you you checked how fast/much message you’re churning out vs.
        > how many consumers you have in the group the handle the workload?
        > >
        > > Also, what are your partition setup for consumer groups?
        > >
        > >
        > > Regards,
        > >
        > > On Thu, 7 Nov 2019 at 22:03, Oliver Eckle <ie...@gmx.de> wrote:
        > >
        > >> Using  kafka-consumer-groups.sh --bootstrap-server localhost:9092
        > >> --describe -group my-app ..
        > >> put the output within the logs .. also its pretty obvious, cause no
        > >> data will flow anymore
        > >>
        > >> Regards
        > >>
        > >> -----Ursprüngliche Nachricht-----
        > >> Von: M. Manna <ma...@gmail.com>
        > >> Gesendet: Donnerstag, 7. November 2019 22:10
        > >> An: users@kafka.apache.org
        > >> Betreff: Re: Consumer Lags and receive no records anymore
        > >>
        > >> Have you checked your Kafka consumer group status ? How did you
        > >> determine that your consumers are lagging ?
        > >>
        > >> Thanks,
        > >>
        > >> On Thu, 7 Nov 2019 at 20:55, Oliver Eckle <ie...@gmx.de> wrote:
        > >>
        > >>> Hi there,
        > >>>
        > >>>
        > >>>
        > >>> have pretty strange behaviour questioned here already:
        > >>> https://stackoverflow.com/q/58650416/7776688
        > >>>
        > >>>
        > >>>
        > >>> As you could see from the logs: https://pastebin.com/yrSytSHD at a
        > >>> specific point the client is stopping to receive records.
        > >>>
        > >>> I have a strong suspicion that it relates to performance on
        > >>> handling the records - so that I run into kind of timeout.
        > >>>
        > >>> What seems to be strange, is that the client is not getting back
        > >>> and heartbeats are processed successfully.
        > >>>
        > >>> Even the consumer will be returned on inspecting the consumer group.
        > >>> Any idea .. kafka log has no error in it.
        > >>>
        > >>>
        > >>>
        > >>> Running a cluster with 3 broker inside a Kubernetes cluster, using
        > >>> the bitnami helm chart.
        > >>>
        > >>>
        > >>>
        > >>> Kind Regards
        > >>>
        > >>> Oliver
        > >>>
        > >>>
        > >>>
        > >>>
        > >>>
        > >>>
        > >>
        > >>
        > >
        >
        >
        >
    
    
    
    This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.
    
    


AW: [External] AW: Consumer Lags and receive no records anymore

Posted by Oliver Eckle <ie...@gmx.de>.
Hi Dave,

thank you . saw some tutorial where they told it otherwise .. which confuses me a litte.
If its done round-robin .. my "world view" makes sense again 😊 

Oliver


-----Ursprüngliche Nachricht-----
Von: Tauzell, Dave <Da...@surescripts.com> 
Gesendet: Freitag, 8. November 2019 16:18
An: users@kafka.apache.org
Betreff: Re: [External] AW: Consumer Lags and receive no records anymore

A null key results in the client sending to partitions in a round-robin order.  Use a key if you want to ensure that specific messages end up on the same partition.

-Dave

On 11/8/19, 1:06 AM, "Oliver Eckle" <ie...@gmx.de> wrote:

    Hi,

    Don’t get me wrong, I just want to understand what's going on.
    so how do I figure out, how much partitions are required? Trial and Error?
    And as far as I understand, if I have null as key for the record, the record is stored in all partitions.
    Is it then not also processed by each consumer, even if I have more than one consumer?
    So could you explain, why the consumer stops to get data?

    Thx

    -----Ursprüngliche Nachricht-----
    Von: M. Manna <ma...@gmail.com>
    Gesendet: Freitag, 8. November 2019 00:51
    An: Kafka Users <us...@kafka.apache.org>
    Betreff: Re: Consumer Lags and receive no records anymore

    Hi again,

    On Thu, 7 Nov 2019 at 23:40, Oliver Eckle <ie...@gmx.de> wrote:

    > Hi,
    >
    > slow consumers - that could be the case. But why is that an issue? I
    > mean I try to use kafka exactly for that and the ability to recover.
    > So e.g if there is some burst scenario where a lot of data arrives and
    > has to be processed, a "slow consumer" will be the default case.
    > What I could understand, that having constantly slow consumers will be
    > an issue, e.g. if there is some compaction on the topic or data will
    > be erased, without having been read.
    >
    > This is what I think about the "lagging topic"
    > The scenario is like that:
    >
    > Producer --- Topic C ---> Consumer --- Processing ---> external REST
    > Endpoint
    >
    > Sending a Record to the external REST Endpoint takes around 300ms.
    > So if I have the "burst scenario" I mentioned above, there is maybe a
    > lag of 1000-2000 records.
    > So consumer is pulling 500 and process them, which means it takes
    > around 150s for the app to process the records.
    > This could create some timeouts I guess ... so that’s the reason why I
    > try to lower the poll records to 50 e.g. cause then is takes only 15s
    > until the poll is committed.
    >
    > Yeah having some liveness probe sounds pretty elegant .. give that a
    > try ...
    > Anyway, I need to understand why that is happening to deal with the
    > scenario the correct way.. killing the consumer after he stops to
    > consume messages, seems to me more like a workaround.
    >
    > Regards
    >
    As per your previous replies, if you have 2 partitions with that topic, you can distribute all data between 2 consumers in your cgroup, and process information. But given your data burst case, I would advise you increase your number of partitions and spread the burst across. Just like any other tool, Kafka requires certain level of configuration to achieve what you want. I would recommend you increase your partitions and consumers to spread the load.

    Regards,

    >
    > -----Ursprüngliche Nachricht-----
    > Von: M. Manna <ma...@gmail.com>
    > Gesendet: Freitag, 8. November 2019 00:24
    > An: users@kafka.apache.org
    > Betreff: Re: Consumer Lags and receive no records anymore
    >
    > Hi,
    >
    > > On 7 Nov 2019, at 22:39, Oliver Eckle <ie...@gmx.de> wrote:
    > >
    > > Have a consumer group with one consumer for the topic .. by
    > misunderstanding I have two partitions on the topic ..
    > > Due to having no key set for the record - I think having several
    > consumers making no sense, or am I wrong.
    > >
    > I am not sure why that would be an issue. If you have 1 consumer your
    > cgroup, yes all the topic partitions will be assigned to that consumer.
    > Slow consumer means your consumers aren’t consuming messages as fast
    > as you are producing (or, fast enough).
    > > Is there any possibility to work around that?
    > > Cause for example on lagging topic is put to a external REST
    > > service,
    > which takes around 300ms to be handled.
    > What do you mean by “Lagging topic is put to an external REST service”?
    > > So is lowering the max.poll.records an option?
    > Polling will keep blocking continuously until minimum bytes of records
    > available. Also, it sends a heartbeat per call of poll().
    > > Anyhow, I could probably not avoid situations like that. Sounds to
    > > me
    > like a pretty common scenario?
    > > So how to deal with them? Having a health check that crush the app
    > > if no
    > data is appearing anymore?
    > In K8s world, you can tie this with liveness probe, if you consumers
    > aren’t live and then you may chose to destroy the pod and bring them
    > back up. Provided that your offset commits are adhering to how
    > technical requirements, you should be able to recover based on the
    > last committed offset. Try that and see how it goes.
    > >
    > > Regards
    > >
    > > -----Ursprüngliche Nachricht-----
    > > Von: M. Manna <ma...@gmail.com>
    > > Gesendet: Donnerstag, 7. November 2019 23:35
    > > An: users@kafka.apache.org
    > > Betreff: Re: Consumer Lags and receive no records anymore
    > >
    > > Consuming not fast/frequent enough is one of the most common reasons
    > > for
    > it. Have you you checked how fast/much message you’re churning out vs.
    > how many consumers you have in the group the handle the workload?
    > >
    > > Also, what are your partition setup for consumer groups?
    > >
    > >
    > > Regards,
    > >
    > > On Thu, 7 Nov 2019 at 22:03, Oliver Eckle <ie...@gmx.de> wrote:
    > >
    > >> Using  kafka-consumer-groups.sh --bootstrap-server localhost:9092
    > >> --describe -group my-app ..
    > >> put the output within the logs .. also its pretty obvious, cause no
    > >> data will flow anymore
    > >>
    > >> Regards
    > >>
    > >> -----Ursprüngliche Nachricht-----
    > >> Von: M. Manna <ma...@gmail.com>
    > >> Gesendet: Donnerstag, 7. November 2019 22:10
    > >> An: users@kafka.apache.org
    > >> Betreff: Re: Consumer Lags and receive no records anymore
    > >>
    > >> Have you checked your Kafka consumer group status ? How did you
    > >> determine that your consumers are lagging ?
    > >>
    > >> Thanks,
    > >>
    > >> On Thu, 7 Nov 2019 at 20:55, Oliver Eckle <ie...@gmx.de> wrote:
    > >>
    > >>> Hi there,
    > >>>
    > >>>
    > >>>
    > >>> have pretty strange behaviour questioned here already:
    > >>> https://stackoverflow.com/q/58650416/7776688
    > >>>
    > >>>
    > >>>
    > >>> As you could see from the logs: https://pastebin.com/yrSytSHD at a
    > >>> specific point the client is stopping to receive records.
    > >>>
    > >>> I have a strong suspicion that it relates to performance on
    > >>> handling the records - so that I run into kind of timeout.
    > >>>
    > >>> What seems to be strange, is that the client is not getting back
    > >>> and heartbeats are processed successfully.
    > >>>
    > >>> Even the consumer will be returned on inspecting the consumer group.
    > >>> Any idea .. kafka log has no error in it.
    > >>>
    > >>>
    > >>>
    > >>> Running a cluster with 3 broker inside a Kubernetes cluster, using
    > >>> the bitnami helm chart.
    > >>>
    > >>>
    > >>>
    > >>> Kind Regards
    > >>>
    > >>> Oliver
    > >>>
    > >>>
    > >>>
    > >>>
    > >>>
    > >>>
    > >>
    > >>
    > >
    >
    >
    >



This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.


Re: [External] AW: Consumer Lags and receive no records anymore

Posted by "Tauzell, Dave" <Da...@surescripts.com>.
A null key results in the client sending to partitions in a round-robin order.  Use a key if you want to ensure that specific messages end up on the same partition.

-Dave

On 11/8/19, 1:06 AM, "Oliver Eckle" <ie...@gmx.de> wrote:

    Hi,

    Don’t get me wrong, I just want to understand what's going on.
    so how do I figure out, how much partitions are required? Trial and Error?
    And as far as I understand, if I have null as key for the record, the record is stored in all partitions.
    Is it then not also processed by each consumer, even if I have more than one consumer?
    So could you explain, why the consumer stops to get data?

    Thx

    -----Ursprüngliche Nachricht-----
    Von: M. Manna <ma...@gmail.com>
    Gesendet: Freitag, 8. November 2019 00:51
    An: Kafka Users <us...@kafka.apache.org>
    Betreff: Re: Consumer Lags and receive no records anymore

    Hi again,

    On Thu, 7 Nov 2019 at 23:40, Oliver Eckle <ie...@gmx.de> wrote:

    > Hi,
    >
    > slow consumers - that could be the case. But why is that an issue? I
    > mean I try to use kafka exactly for that and the ability to recover.
    > So e.g if there is some burst scenario where a lot of data arrives and
    > has to be processed, a "slow consumer" will be the default case.
    > What I could understand, that having constantly slow consumers will be
    > an issue, e.g. if there is some compaction on the topic or data will
    > be erased, without having been read.
    >
    > This is what I think about the "lagging topic"
    > The scenario is like that:
    >
    > Producer --- Topic C ---> Consumer --- Processing ---> external REST
    > Endpoint
    >
    > Sending a Record to the external REST Endpoint takes around 300ms.
    > So if I have the "burst scenario" I mentioned above, there is maybe a
    > lag of 1000-2000 records.
    > So consumer is pulling 500 and process them, which means it takes
    > around 150s for the app to process the records.
    > This could create some timeouts I guess ... so that’s the reason why I
    > try to lower the poll records to 50 e.g. cause then is takes only 15s
    > until the poll is committed.
    >
    > Yeah having some liveness probe sounds pretty elegant .. give that a
    > try ...
    > Anyway, I need to understand why that is happening to deal with the
    > scenario the correct way.. killing the consumer after he stops to
    > consume messages, seems to me more like a workaround.
    >
    > Regards
    >
    As per your previous replies, if you have 2 partitions with that topic, you can distribute all data between 2 consumers in your cgroup, and process information. But given your data burst case, I would advise you increase your number of partitions and spread the burst across. Just like any other tool, Kafka requires certain level of configuration to achieve what you want. I would recommend you increase your partitions and consumers to spread the load.

    Regards,

    >
    > -----Ursprüngliche Nachricht-----
    > Von: M. Manna <ma...@gmail.com>
    > Gesendet: Freitag, 8. November 2019 00:24
    > An: users@kafka.apache.org
    > Betreff: Re: Consumer Lags and receive no records anymore
    >
    > Hi,
    >
    > > On 7 Nov 2019, at 22:39, Oliver Eckle <ie...@gmx.de> wrote:
    > >
    > > Have a consumer group with one consumer for the topic .. by
    > misunderstanding I have two partitions on the topic ..
    > > Due to having no key set for the record - I think having several
    > consumers making no sense, or am I wrong.
    > >
    > I am not sure why that would be an issue. If you have 1 consumer your
    > cgroup, yes all the topic partitions will be assigned to that consumer.
    > Slow consumer means your consumers aren’t consuming messages as fast
    > as you are producing (or, fast enough).
    > > Is there any possibility to work around that?
    > > Cause for example on lagging topic is put to a external REST
    > > service,
    > which takes around 300ms to be handled.
    > What do you mean by “Lagging topic is put to an external REST service”?
    > > So is lowering the max.poll.records an option?
    > Polling will keep blocking continuously until minimum bytes of records
    > available. Also, it sends a heartbeat per call of poll().
    > > Anyhow, I could probably not avoid situations like that. Sounds to
    > > me
    > like a pretty common scenario?
    > > So how to deal with them? Having a health check that crush the app
    > > if no
    > data is appearing anymore?
    > In K8s world, you can tie this with liveness probe, if you consumers
    > aren’t live and then you may chose to destroy the pod and bring them
    > back up. Provided that your offset commits are adhering to how
    > technical requirements, you should be able to recover based on the
    > last committed offset. Try that and see how it goes.
    > >
    > > Regards
    > >
    > > -----Ursprüngliche Nachricht-----
    > > Von: M. Manna <ma...@gmail.com>
    > > Gesendet: Donnerstag, 7. November 2019 23:35
    > > An: users@kafka.apache.org
    > > Betreff: Re: Consumer Lags and receive no records anymore
    > >
    > > Consuming not fast/frequent enough is one of the most common reasons
    > > for
    > it. Have you you checked how fast/much message you’re churning out vs.
    > how many consumers you have in the group the handle the workload?
    > >
    > > Also, what are your partition setup for consumer groups?
    > >
    > >
    > > Regards,
    > >
    > > On Thu, 7 Nov 2019 at 22:03, Oliver Eckle <ie...@gmx.de> wrote:
    > >
    > >> Using  kafka-consumer-groups.sh --bootstrap-server localhost:9092
    > >> --describe -group my-app ..
    > >> put the output within the logs .. also its pretty obvious, cause no
    > >> data will flow anymore
    > >>
    > >> Regards
    > >>
    > >> -----Ursprüngliche Nachricht-----
    > >> Von: M. Manna <ma...@gmail.com>
    > >> Gesendet: Donnerstag, 7. November 2019 22:10
    > >> An: users@kafka.apache.org
    > >> Betreff: Re: Consumer Lags and receive no records anymore
    > >>
    > >> Have you checked your Kafka consumer group status ? How did you
    > >> determine that your consumers are lagging ?
    > >>
    > >> Thanks,
    > >>
    > >> On Thu, 7 Nov 2019 at 20:55, Oliver Eckle <ie...@gmx.de> wrote:
    > >>
    > >>> Hi there,
    > >>>
    > >>>
    > >>>
    > >>> have pretty strange behaviour questioned here already:
    > >>> https://stackoverflow.com/q/58650416/7776688
    > >>>
    > >>>
    > >>>
    > >>> As you could see from the logs: https://pastebin.com/yrSytSHD at a
    > >>> specific point the client is stopping to receive records.
    > >>>
    > >>> I have a strong suspicion that it relates to performance on
    > >>> handling the records - so that I run into kind of timeout.
    > >>>
    > >>> What seems to be strange, is that the client is not getting back
    > >>> and heartbeats are processed successfully.
    > >>>
    > >>> Even the consumer will be returned on inspecting the consumer group.
    > >>> Any idea .. kafka log has no error in it.
    > >>>
    > >>>
    > >>>
    > >>> Running a cluster with 3 broker inside a Kubernetes cluster, using
    > >>> the bitnami helm chart.
    > >>>
    > >>>
    > >>>
    > >>> Kind Regards
    > >>>
    > >>> Oliver
    > >>>
    > >>>
    > >>>
    > >>>
    > >>>
    > >>>
    > >>
    > >>
    > >
    >
    >
    >



This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.

AW: Consumer Lags and receive no records anymore

Posted by Oliver Eckle <ie...@gmx.de>.
Hi,

just still having some questions.
Are you talking about lagging consumer, or about a consumer that completely stops to get new records?
Because my issue is about the stopping of getting new records - which results in some lagging.
Its not about the lagging itself - just to make shure we talking about the same stuff.

Anyway I introduced now some livenessProbe, and add some record consuming health check to my app and also upgraded to new kafka version.
Currently it works fine, so I keep on monitoring 😊
Also thx for the monitoring advices, which I definitively will consider and use.

Oliver

-----Ursprüngliche Nachricht-----
Von: M. Manna <ma...@gmail.com> 
Gesendet: Freitag, 8. November 2019 12:09
An: Kafka Users <us...@kafka.apache.org>
Betreff: Re: Consumer Lags and receive no records anymore

Hi

On Fri, 8 Nov 2019 at 07:06, Oliver Eckle <ie...@gmx.de> wrote:

> Hi,
>
> Don’t get me wrong, I just want to understand what's going on.
> so how do I figure out, how much partitions are required? Trial and Error?
>
 Normally, you have to run your perf test with adequate batching,partition,isr etc. to determine what speed/consistency is good for you. So, it's not only trial and error - it's what you need.

> And as far as I understand, if I have null as key for the record, the 
> record is stored in all partitions.
>
They are written to a specific partition each time you do a send(). If you have 1 partition then it's irrelevant.

> Is it then not also processed by each consumer, even if I have more 
> than one consumer?
>
 If you have created two consumer thread under the same consumer group, each will get 1 partition (if you have 2). Each consumer polls and processed data, and then commit offsets to indicate their position.

> So could you explain, why the consumer stops to get data?
>
 Unfortunately, without understanding your setup it's a broad question. So far you're saying that you've got a sudden data burst of ~2k messages. If the delay between consecutive poll() is long enough, there will be big timeouts. Provided that your timeouts are still within request/session limits, the chances are that you are still a lot slower than you should be.
This again, goes back to how you have determined your topic partition setup. If you do general research on various Kafka blogs or Tech areas, the general conclusion is always the same "you're slow in consuming messages".
Also, I would advise that you reproduce it in your test env (if possible) so that you can identify where the bottleneck is.
Also, for reference

https://www.lightbend.com/blog/monitor-kafka-consumer-group-latency-with-kafka-lag-exporter

https://sematext.com/blog/kafka-consumer-lag-offsets-monitoring/

I hope this helps.

Regards,

>
> Thx
>
> -----Ursprüngliche Nachricht-----
> Von: M. Manna <ma...@gmail.com>
> Gesendet: Freitag, 8. November 2019 00:51
> An: Kafka Users <us...@kafka.apache.org>
> Betreff: Re: Consumer Lags and receive no records anymore
>
> Hi again,
>
> On Thu, 7 Nov 2019 at 23:40, Oliver Eckle <ie...@gmx.de> wrote:
>
> > Hi,
> >
> > slow consumers - that could be the case. But why is that an issue? I 
> > mean I try to use kafka exactly for that and the ability to recover.
> > So e.g if there is some burst scenario where a lot of data arrives 
> > and has to be processed, a "slow consumer" will be the default case.
> > What I could understand, that having constantly slow consumers will 
> > be an issue, e.g. if there is some compaction on the topic or data 
> > will be erased, without having been read.
> >
> > This is what I think about the "lagging topic"
> > The scenario is like that:
> >
> > Producer --- Topic C ---> Consumer --- Processing ---> external REST 
> > Endpoint
> >
> > Sending a Record to the external REST Endpoint takes around 300ms.
> > So if I have the "burst scenario" I mentioned above, there is maybe 
> > a lag of 1000-2000 records.
> > So consumer is pulling 500 and process them, which means it takes 
> > around 150s for the app to process the records.
> > This could create some timeouts I guess ... so that’s the reason why 
> > I try to lower the poll records to 50 e.g. cause then is takes only 
> > 15s until the poll is committed.
> >
> > Yeah having some liveness probe sounds pretty elegant .. give that a 
> > try ...
> > Anyway, I need to understand why that is happening to deal with the 
> > scenario the correct way.. killing the consumer after he stops to 
> > consume messages, seems to me more like a workaround.
> >
> > Regards
> >
> As per your previous replies, if you have 2 partitions with that 
> topic, you can distribute all data between 2 consumers in your cgroup, 
> and process information. But given your data burst case, I would 
> advise you increase your number of partitions and spread the burst 
> across. Just like any other tool, Kafka requires certain level of 
> configuration to achieve what you want. I would recommend you increase 
> your partitions and consumers to spread the load.
>
> Regards,
>
> >
> > -----Ursprüngliche Nachricht-----
> > Von: M. Manna <ma...@gmail.com>
> > Gesendet: Freitag, 8. November 2019 00:24
> > An: users@kafka.apache.org
> > Betreff: Re: Consumer Lags and receive no records anymore
> >
> > Hi,
> >
> > > On 7 Nov 2019, at 22:39, Oliver Eckle <ie...@gmx.de> wrote:
> > >
> > > Have a consumer group with one consumer for the topic .. by
> > misunderstanding I have two partitions on the topic ..
> > > Due to having no key set for the record - I think having several
> > consumers making no sense, or am I wrong.
> > >
> > I am not sure why that would be an issue. If you have 1 consumer 
> > your cgroup, yes all the topic partitions will be assigned to that consumer.
> > Slow consumer means your consumers aren’t consuming messages as fast 
> > as you are producing (or, fast enough).
> > > Is there any possibility to work around that?
> > > Cause for example on lagging topic is put to a external REST 
> > > service,
> > which takes around 300ms to be handled.
> > What do you mean by “Lagging topic is put to an external REST service”?
> > > So is lowering the max.poll.records an option?
> > Polling will keep blocking continuously until minimum bytes of 
> > records available. Also, it sends a heartbeat per call of poll().
> > > Anyhow, I could probably not avoid situations like that. Sounds to 
> > > me
> > like a pretty common scenario?
> > > So how to deal with them? Having a health check that crush the app 
> > > if no
> > data is appearing anymore?
> > In K8s world, you can tie this with liveness probe, if you consumers 
> > aren’t live and then you may chose to destroy the pod and bring them 
> > back up. Provided that your offset commits are adhering to how 
> > technical requirements, you should be able to recover based on the 
> > last committed offset. Try that and see how it goes.
> > >
> > > Regards
> > >
> > > -----Ursprüngliche Nachricht-----
> > > Von: M. Manna <ma...@gmail.com>
> > > Gesendet: Donnerstag, 7. November 2019 23:35
> > > An: users@kafka.apache.org
> > > Betreff: Re: Consumer Lags and receive no records anymore
> > >
> > > Consuming not fast/frequent enough is one of the most common 
> > > reasons for
> > it. Have you you checked how fast/much message you’re churning out vs.
> > how many consumers you have in the group the handle the workload?
> > >
> > > Also, what are your partition setup for consumer groups?
> > >
> > >
> > > Regards,
> > >
> > > On Thu, 7 Nov 2019 at 22:03, Oliver Eckle <ie...@gmx.de> wrote:
> > >
> > >> Using  kafka-consumer-groups.sh --bootstrap-server localhost:9092 
> > >> --describe -group my-app ..
> > >> put the output within the logs .. also its pretty obvious, cause 
> > >> no data will flow anymore
> > >>
> > >> Regards
> > >>
> > >> -----Ursprüngliche Nachricht-----
> > >> Von: M. Manna <ma...@gmail.com>
> > >> Gesendet: Donnerstag, 7. November 2019 22:10
> > >> An: users@kafka.apache.org
> > >> Betreff: Re: Consumer Lags and receive no records anymore
> > >>
> > >> Have you checked your Kafka consumer group status ? How did you 
> > >> determine that your consumers are lagging ?
> > >>
> > >> Thanks,
> > >>
> > >> On Thu, 7 Nov 2019 at 20:55, Oliver Eckle <ie...@gmx.de> wrote:
> > >>
> > >>> Hi there,
> > >>>
> > >>>
> > >>>
> > >>> have pretty strange behaviour questioned here already:
> > >>> https://stackoverflow.com/q/58650416/7776688
> > >>>
> > >>>
> > >>>
> > >>> As you could see from the logs: https://pastebin.com/yrSytSHD at 
> > >>> a specific point the client is stopping to receive records.
> > >>>
> > >>> I have a strong suspicion that it relates to performance on 
> > >>> handling the records - so that I run into kind of timeout.
> > >>>
> > >>> What seems to be strange, is that the client is not getting back 
> > >>> and heartbeats are processed successfully.
> > >>>
> > >>> Even the consumer will be returned on inspecting the consumer group.
> > >>> Any idea .. kafka log has no error in it.
> > >>>
> > >>>
> > >>>
> > >>> Running a cluster with 3 broker inside a Kubernetes cluster, 
> > >>> using the bitnami helm chart.
> > >>>
> > >>>
> > >>>
> > >>> Kind Regards
> > >>>
> > >>> Oliver
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > >
> >
> >
> >
>
>


Re: Consumer Lags and receive no records anymore

Posted by "M. Manna" <ma...@gmail.com>.
Hi

On Fri, 8 Nov 2019 at 07:06, Oliver Eckle <ie...@gmx.de> wrote:

> Hi,
>
> Don’t get me wrong, I just want to understand what's going on.
> so how do I figure out, how much partitions are required? Trial and Error?
>
 Normally, you have to run your perf test with adequate
batching,partition,isr etc. to determine what speed/consistency is good for
you. So, it's not only trial and error - it's what you need.

> And as far as I understand, if I have null as key for the record, the
> record is stored in all partitions.
>
They are written to a specific partition each time you do a send(). If you
have 1 partition then it's irrelevant.

> Is it then not also processed by each consumer, even if I have more than
> one consumer?
>
 If you have created two consumer thread under the same consumer group,
each will get 1 partition (if you have 2). Each consumer polls and
processed data, and then commit offsets to indicate their position.

> So could you explain, why the consumer stops to get data?
>
 Unfortunately, without understanding your setup it's a broad question. So
far you're saying that you've got a sudden data burst of ~2k messages. If
the delay between consecutive poll() is long enough, there will be big
timeouts. Provided that your timeouts are still within request/session
limits, the chances are that you are still a lot slower than you should be.
This again, goes back to how you have determined your topic partition
setup. If you do general research on various Kafka blogs or Tech areas, the
general conclusion is always the same "you're slow in consuming messages".
Also, I would advise that you reproduce it in your test env (if possible)
so that you can identify where the bottleneck is.
Also, for reference

https://www.lightbend.com/blog/monitor-kafka-consumer-group-latency-with-kafka-lag-exporter

https://sematext.com/blog/kafka-consumer-lag-offsets-monitoring/

I hope this helps.

Regards,

>
> Thx
>
> -----Ursprüngliche Nachricht-----
> Von: M. Manna <ma...@gmail.com>
> Gesendet: Freitag, 8. November 2019 00:51
> An: Kafka Users <us...@kafka.apache.org>
> Betreff: Re: Consumer Lags and receive no records anymore
>
> Hi again,
>
> On Thu, 7 Nov 2019 at 23:40, Oliver Eckle <ie...@gmx.de> wrote:
>
> > Hi,
> >
> > slow consumers - that could be the case. But why is that an issue? I
> > mean I try to use kafka exactly for that and the ability to recover.
> > So e.g if there is some burst scenario where a lot of data arrives and
> > has to be processed, a "slow consumer" will be the default case.
> > What I could understand, that having constantly slow consumers will be
> > an issue, e.g. if there is some compaction on the topic or data will
> > be erased, without having been read.
> >
> > This is what I think about the "lagging topic"
> > The scenario is like that:
> >
> > Producer --- Topic C ---> Consumer --- Processing ---> external REST
> > Endpoint
> >
> > Sending a Record to the external REST Endpoint takes around 300ms.
> > So if I have the "burst scenario" I mentioned above, there is maybe a
> > lag of 1000-2000 records.
> > So consumer is pulling 500 and process them, which means it takes
> > around 150s for the app to process the records.
> > This could create some timeouts I guess ... so that’s the reason why I
> > try to lower the poll records to 50 e.g. cause then is takes only 15s
> > until the poll is committed.
> >
> > Yeah having some liveness probe sounds pretty elegant .. give that a
> > try ...
> > Anyway, I need to understand why that is happening to deal with the
> > scenario the correct way.. killing the consumer after he stops to
> > consume messages, seems to me more like a workaround.
> >
> > Regards
> >
> As per your previous replies, if you have 2 partitions with that topic,
> you can distribute all data between 2 consumers in your cgroup, and process
> information. But given your data burst case, I would advise you increase
> your number of partitions and spread the burst across. Just like any other
> tool, Kafka requires certain level of configuration to achieve what you
> want. I would recommend you increase your partitions and consumers to
> spread the load.
>
> Regards,
>
> >
> > -----Ursprüngliche Nachricht-----
> > Von: M. Manna <ma...@gmail.com>
> > Gesendet: Freitag, 8. November 2019 00:24
> > An: users@kafka.apache.org
> > Betreff: Re: Consumer Lags and receive no records anymore
> >
> > Hi,
> >
> > > On 7 Nov 2019, at 22:39, Oliver Eckle <ie...@gmx.de> wrote:
> > >
> > > Have a consumer group with one consumer for the topic .. by
> > misunderstanding I have two partitions on the topic ..
> > > Due to having no key set for the record - I think having several
> > consumers making no sense, or am I wrong.
> > >
> > I am not sure why that would be an issue. If you have 1 consumer your
> > cgroup, yes all the topic partitions will be assigned to that consumer.
> > Slow consumer means your consumers aren’t consuming messages as fast
> > as you are producing (or, fast enough).
> > > Is there any possibility to work around that?
> > > Cause for example on lagging topic is put to a external REST
> > > service,
> > which takes around 300ms to be handled.
> > What do you mean by “Lagging topic is put to an external REST service”?
> > > So is lowering the max.poll.records an option?
> > Polling will keep blocking continuously until minimum bytes of records
> > available. Also, it sends a heartbeat per call of poll().
> > > Anyhow, I could probably not avoid situations like that. Sounds to
> > > me
> > like a pretty common scenario?
> > > So how to deal with them? Having a health check that crush the app
> > > if no
> > data is appearing anymore?
> > In K8s world, you can tie this with liveness probe, if you consumers
> > aren’t live and then you may chose to destroy the pod and bring them
> > back up. Provided that your offset commits are adhering to how
> > technical requirements, you should be able to recover based on the
> > last committed offset. Try that and see how it goes.
> > >
> > > Regards
> > >
> > > -----Ursprüngliche Nachricht-----
> > > Von: M. Manna <ma...@gmail.com>
> > > Gesendet: Donnerstag, 7. November 2019 23:35
> > > An: users@kafka.apache.org
> > > Betreff: Re: Consumer Lags and receive no records anymore
> > >
> > > Consuming not fast/frequent enough is one of the most common reasons
> > > for
> > it. Have you you checked how fast/much message you’re churning out vs.
> > how many consumers you have in the group the handle the workload?
> > >
> > > Also, what are your partition setup for consumer groups?
> > >
> > >
> > > Regards,
> > >
> > > On Thu, 7 Nov 2019 at 22:03, Oliver Eckle <ie...@gmx.de> wrote:
> > >
> > >> Using  kafka-consumer-groups.sh --bootstrap-server localhost:9092
> > >> --describe -group my-app ..
> > >> put the output within the logs .. also its pretty obvious, cause no
> > >> data will flow anymore
> > >>
> > >> Regards
> > >>
> > >> -----Ursprüngliche Nachricht-----
> > >> Von: M. Manna <ma...@gmail.com>
> > >> Gesendet: Donnerstag, 7. November 2019 22:10
> > >> An: users@kafka.apache.org
> > >> Betreff: Re: Consumer Lags and receive no records anymore
> > >>
> > >> Have you checked your Kafka consumer group status ? How did you
> > >> determine that your consumers are lagging ?
> > >>
> > >> Thanks,
> > >>
> > >> On Thu, 7 Nov 2019 at 20:55, Oliver Eckle <ie...@gmx.de> wrote:
> > >>
> > >>> Hi there,
> > >>>
> > >>>
> > >>>
> > >>> have pretty strange behaviour questioned here already:
> > >>> https://stackoverflow.com/q/58650416/7776688
> > >>>
> > >>>
> > >>>
> > >>> As you could see from the logs: https://pastebin.com/yrSytSHD at a
> > >>> specific point the client is stopping to receive records.
> > >>>
> > >>> I have a strong suspicion that it relates to performance on
> > >>> handling the records - so that I run into kind of timeout.
> > >>>
> > >>> What seems to be strange, is that the client is not getting back
> > >>> and heartbeats are processed successfully.
> > >>>
> > >>> Even the consumer will be returned on inspecting the consumer group.
> > >>> Any idea .. kafka log has no error in it.
> > >>>
> > >>>
> > >>>
> > >>> Running a cluster with 3 broker inside a Kubernetes cluster, using
> > >>> the bitnami helm chart.
> > >>>
> > >>>
> > >>>
> > >>> Kind Regards
> > >>>
> > >>> Oliver
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > >
> >
> >
> >
>
>

AW: Consumer Lags and receive no records anymore

Posted by Oliver Eckle <ie...@gmx.de>.
Hi, 

Don’t get me wrong, I just want to understand what's going on.
so how do I figure out, how much partitions are required? Trial and Error?
And as far as I understand, if I have null as key for the record, the record is stored in all partitions. 
Is it then not also processed by each consumer, even if I have more than one consumer?
So could you explain, why the consumer stops to get data?

Thx

-----Ursprüngliche Nachricht-----
Von: M. Manna <ma...@gmail.com> 
Gesendet: Freitag, 8. November 2019 00:51
An: Kafka Users <us...@kafka.apache.org>
Betreff: Re: Consumer Lags and receive no records anymore

Hi again,

On Thu, 7 Nov 2019 at 23:40, Oliver Eckle <ie...@gmx.de> wrote:

> Hi,
>
> slow consumers - that could be the case. But why is that an issue? I 
> mean I try to use kafka exactly for that and the ability to recover.
> So e.g if there is some burst scenario where a lot of data arrives and 
> has to be processed, a "slow consumer" will be the default case.
> What I could understand, that having constantly slow consumers will be 
> an issue, e.g. if there is some compaction on the topic or data will 
> be erased, without having been read.
>
> This is what I think about the "lagging topic"
> The scenario is like that:
>
> Producer --- Topic C ---> Consumer --- Processing ---> external REST 
> Endpoint
>
> Sending a Record to the external REST Endpoint takes around 300ms.
> So if I have the "burst scenario" I mentioned above, there is maybe a 
> lag of 1000-2000 records.
> So consumer is pulling 500 and process them, which means it takes 
> around 150s for the app to process the records.
> This could create some timeouts I guess ... so that’s the reason why I 
> try to lower the poll records to 50 e.g. cause then is takes only 15s 
> until the poll is committed.
>
> Yeah having some liveness probe sounds pretty elegant .. give that a 
> try ...
> Anyway, I need to understand why that is happening to deal with the 
> scenario the correct way.. killing the consumer after he stops to 
> consume messages, seems to me more like a workaround.
>
> Regards
>
As per your previous replies, if you have 2 partitions with that topic, you can distribute all data between 2 consumers in your cgroup, and process information. But given your data burst case, I would advise you increase your number of partitions and spread the burst across. Just like any other tool, Kafka requires certain level of configuration to achieve what you want. I would recommend you increase your partitions and consumers to spread the load.

Regards,

>
> -----Ursprüngliche Nachricht-----
> Von: M. Manna <ma...@gmail.com>
> Gesendet: Freitag, 8. November 2019 00:24
> An: users@kafka.apache.org
> Betreff: Re: Consumer Lags and receive no records anymore
>
> Hi,
>
> > On 7 Nov 2019, at 22:39, Oliver Eckle <ie...@gmx.de> wrote:
> >
> > Have a consumer group with one consumer for the topic .. by
> misunderstanding I have two partitions on the topic ..
> > Due to having no key set for the record - I think having several
> consumers making no sense, or am I wrong.
> >
> I am not sure why that would be an issue. If you have 1 consumer your 
> cgroup, yes all the topic partitions will be assigned to that consumer.
> Slow consumer means your consumers aren’t consuming messages as fast 
> as you are producing (or, fast enough).
> > Is there any possibility to work around that?
> > Cause for example on lagging topic is put to a external REST 
> > service,
> which takes around 300ms to be handled.
> What do you mean by “Lagging topic is put to an external REST service”?
> > So is lowering the max.poll.records an option?
> Polling will keep blocking continuously until minimum bytes of records 
> available. Also, it sends a heartbeat per call of poll().
> > Anyhow, I could probably not avoid situations like that. Sounds to 
> > me
> like a pretty common scenario?
> > So how to deal with them? Having a health check that crush the app 
> > if no
> data is appearing anymore?
> In K8s world, you can tie this with liveness probe, if you consumers 
> aren’t live and then you may chose to destroy the pod and bring them 
> back up. Provided that your offset commits are adhering to how 
> technical requirements, you should be able to recover based on the 
> last committed offset. Try that and see how it goes.
> >
> > Regards
> >
> > -----Ursprüngliche Nachricht-----
> > Von: M. Manna <ma...@gmail.com>
> > Gesendet: Donnerstag, 7. November 2019 23:35
> > An: users@kafka.apache.org
> > Betreff: Re: Consumer Lags and receive no records anymore
> >
> > Consuming not fast/frequent enough is one of the most common reasons 
> > for
> it. Have you you checked how fast/much message you’re churning out vs. 
> how many consumers you have in the group the handle the workload?
> >
> > Also, what are your partition setup for consumer groups?
> >
> >
> > Regards,
> >
> > On Thu, 7 Nov 2019 at 22:03, Oliver Eckle <ie...@gmx.de> wrote:
> >
> >> Using  kafka-consumer-groups.sh --bootstrap-server localhost:9092 
> >> --describe -group my-app ..
> >> put the output within the logs .. also its pretty obvious, cause no 
> >> data will flow anymore
> >>
> >> Regards
> >>
> >> -----Ursprüngliche Nachricht-----
> >> Von: M. Manna <ma...@gmail.com>
> >> Gesendet: Donnerstag, 7. November 2019 22:10
> >> An: users@kafka.apache.org
> >> Betreff: Re: Consumer Lags and receive no records anymore
> >>
> >> Have you checked your Kafka consumer group status ? How did you 
> >> determine that your consumers are lagging ?
> >>
> >> Thanks,
> >>
> >> On Thu, 7 Nov 2019 at 20:55, Oliver Eckle <ie...@gmx.de> wrote:
> >>
> >>> Hi there,
> >>>
> >>>
> >>>
> >>> have pretty strange behaviour questioned here already:
> >>> https://stackoverflow.com/q/58650416/7776688
> >>>
> >>>
> >>>
> >>> As you could see from the logs: https://pastebin.com/yrSytSHD at a 
> >>> specific point the client is stopping to receive records.
> >>>
> >>> I have a strong suspicion that it relates to performance on 
> >>> handling the records - so that I run into kind of timeout.
> >>>
> >>> What seems to be strange, is that the client is not getting back 
> >>> and heartbeats are processed successfully.
> >>>
> >>> Even the consumer will be returned on inspecting the consumer group.
> >>> Any idea .. kafka log has no error in it.
> >>>
> >>>
> >>>
> >>> Running a cluster with 3 broker inside a Kubernetes cluster, using 
> >>> the bitnami helm chart.
> >>>
> >>>
> >>>
> >>> Kind Regards
> >>>
> >>> Oliver
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >
>
>
>


Re: Consumer Lags and receive no records anymore

Posted by "M. Manna" <ma...@gmail.com>.
Hi again,

On Thu, 7 Nov 2019 at 23:40, Oliver Eckle <ie...@gmx.de> wrote:

> Hi,
>
> slow consumers - that could be the case. But why is that an issue? I mean
> I try to use kafka exactly for that and the ability to recover.
> So e.g if there is some burst scenario where a lot of data arrives and has
> to be processed, a "slow consumer" will be the default case.
> What I could understand, that having constantly slow consumers will be an
> issue, e.g. if there is some compaction on the topic or data will be
> erased, without having been read.
>
> This is what I think about the "lagging topic"
> The scenario is like that:
>
> Producer --- Topic C ---> Consumer --- Processing ---> external REST
> Endpoint
>
> Sending a Record to the external REST Endpoint takes around 300ms.
> So if I have the "burst scenario" I mentioned above, there is maybe a lag
> of 1000-2000 records.
> So consumer is pulling 500 and process them, which means it takes around
> 150s for the app to process the records.
> This could create some timeouts I guess ... so that’s the reason why I try
> to lower the poll records to 50 e.g. cause then is takes only 15s until the
> poll is committed.
>
> Yeah having some liveness probe sounds pretty elegant .. give that a try
> ...
> Anyway, I need to understand why that is happening to deal with the
> scenario the correct way.. killing the consumer after he stops to consume
> messages, seems to me more like a workaround.
>
> Regards
>
As per your previous replies, if you have 2 partitions with that topic, you
can distribute all data between 2 consumers in your cgroup, and process
information. But given your data burst case, I would advise you increase
your number of partitions and spread the burst across. Just like any other
tool, Kafka requires certain level of configuration to achieve what you
want. I would recommend you increase your partitions and consumers to
spread the load.

Regards,

>
> -----Ursprüngliche Nachricht-----
> Von: M. Manna <ma...@gmail.com>
> Gesendet: Freitag, 8. November 2019 00:24
> An: users@kafka.apache.org
> Betreff: Re: Consumer Lags and receive no records anymore
>
> Hi,
>
> > On 7 Nov 2019, at 22:39, Oliver Eckle <ie...@gmx.de> wrote:
> >
> > Have a consumer group with one consumer for the topic .. by
> misunderstanding I have two partitions on the topic ..
> > Due to having no key set for the record - I think having several
> consumers making no sense, or am I wrong.
> >
> I am not sure why that would be an issue. If you have 1 consumer your
> cgroup, yes all the topic partitions will be assigned to that consumer.
> Slow consumer means your consumers aren’t consuming messages as fast as you
> are producing (or, fast enough).
> > Is there any possibility to work around that?
> > Cause for example on lagging topic is put to a external REST service,
> which takes around 300ms to be handled.
> What do you mean by “Lagging topic is put to an external REST service”?
> > So is lowering the max.poll.records an option?
> Polling will keep blocking continuously until minimum bytes of records
> available. Also, it sends a heartbeat per call of poll().
> > Anyhow, I could probably not avoid situations like that. Sounds to me
> like a pretty common scenario?
> > So how to deal with them? Having a health check that crush the app if no
> data is appearing anymore?
> In K8s world, you can tie this with liveness probe, if you consumers
> aren’t live and then you may chose to destroy the pod and bring them back
> up. Provided that your offset commits are adhering to how technical
> requirements, you should be able to recover based on the last committed
> offset. Try that and see how it goes.
> >
> > Regards
> >
> > -----Ursprüngliche Nachricht-----
> > Von: M. Manna <ma...@gmail.com>
> > Gesendet: Donnerstag, 7. November 2019 23:35
> > An: users@kafka.apache.org
> > Betreff: Re: Consumer Lags and receive no records anymore
> >
> > Consuming not fast/frequent enough is one of the most common reasons for
> it. Have you you checked how fast/much message you’re churning out vs. how
> many consumers you have in the group the handle the workload?
> >
> > Also, what are your partition setup for consumer groups?
> >
> >
> > Regards,
> >
> > On Thu, 7 Nov 2019 at 22:03, Oliver Eckle <ie...@gmx.de> wrote:
> >
> >> Using  kafka-consumer-groups.sh --bootstrap-server localhost:9092
> >> --describe -group my-app ..
> >> put the output within the logs .. also its pretty obvious, cause no
> >> data will flow anymore
> >>
> >> Regards
> >>
> >> -----Ursprüngliche Nachricht-----
> >> Von: M. Manna <ma...@gmail.com>
> >> Gesendet: Donnerstag, 7. November 2019 22:10
> >> An: users@kafka.apache.org
> >> Betreff: Re: Consumer Lags and receive no records anymore
> >>
> >> Have you checked your Kafka consumer group status ? How did you
> >> determine that your consumers are lagging ?
> >>
> >> Thanks,
> >>
> >> On Thu, 7 Nov 2019 at 20:55, Oliver Eckle <ie...@gmx.de> wrote:
> >>
> >>> Hi there,
> >>>
> >>>
> >>>
> >>> have pretty strange behaviour questioned here already:
> >>> https://stackoverflow.com/q/58650416/7776688
> >>>
> >>>
> >>>
> >>> As you could see from the logs: https://pastebin.com/yrSytSHD at a
> >>> specific point the client is stopping to receive records.
> >>>
> >>> I have a strong suspicion that it relates to performance on handling
> >>> the records - so that I run into kind of timeout.
> >>>
> >>> What seems to be strange, is that the client is not getting back and
> >>> heartbeats are processed successfully.
> >>>
> >>> Even the consumer will be returned on inspecting the consumer group.
> >>> Any idea .. kafka log has no error in it.
> >>>
> >>>
> >>>
> >>> Running a cluster with 3 broker inside a Kubernetes cluster, using
> >>> the bitnami helm chart.
> >>>
> >>>
> >>>
> >>> Kind Regards
> >>>
> >>> Oliver
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >
>
>
>

AW: Consumer Lags and receive no records anymore

Posted by Oliver Eckle <ie...@gmx.de>.
Hi,

slow consumers - that could be the case. But why is that an issue? I mean I try to use kafka exactly for that and the ability to recover.
So e.g if there is some burst scenario where a lot of data arrives and has to be processed, a "slow consumer" will be the default case.
What I could understand, that having constantly slow consumers will be an issue, e.g. if there is some compaction on the topic or data will be erased, without having been read.

This is what I think about the "lagging topic"
The scenario is like that:

Producer --- Topic C ---> Consumer --- Processing ---> external REST Endpoint

Sending a Record to the external REST Endpoint takes around 300ms.
So if I have the "burst scenario" I mentioned above, there is maybe a lag of 1000-2000 records.
So consumer is pulling 500 and process them, which means it takes around 150s for the app to process the records.
This could create some timeouts I guess ... so that’s the reason why I try to lower the poll records to 50 e.g. cause then is takes only 15s until the poll is committed.

Yeah having some liveness probe sounds pretty elegant .. give that a try ...
Anyway, I need to understand why that is happening to deal with the scenario the correct way.. killing the consumer after he stops to consume messages, seems to me more like a workaround.

Regards

-----Ursprüngliche Nachricht-----
Von: M. Manna <ma...@gmail.com> 
Gesendet: Freitag, 8. November 2019 00:24
An: users@kafka.apache.org
Betreff: Re: Consumer Lags and receive no records anymore

Hi,

> On 7 Nov 2019, at 22:39, Oliver Eckle <ie...@gmx.de> wrote:
> 
> Have a consumer group with one consumer for the topic .. by misunderstanding I have two partitions on the topic .. 
> Due to having no key set for the record - I think having several consumers making no sense, or am I wrong.
> 
I am not sure why that would be an issue. If you have 1 consumer your cgroup, yes all the topic partitions will be assigned to that consumer. Slow consumer means your consumers aren’t consuming messages as fast as you are producing (or, fast enough). 
> Is there any possibility to work around that? 
> Cause for example on lagging topic is put to a external REST service, which takes around 300ms to be handled.
What do you mean by “Lagging topic is put to an external REST service”? 
> So is lowering the max.poll.records an option?
Polling will keep blocking continuously until minimum bytes of records available. Also, it sends a heartbeat per call of poll().
> Anyhow, I could probably not avoid situations like that. Sounds to me like a pretty common scenario?
> So how to deal with them? Having a health check that crush the app if no data is appearing anymore?
In K8s world, you can tie this with liveness probe, if you consumers aren’t live and then you may chose to destroy the pod and bring them back up. Provided that your offset commits are adhering to how technical requirements, you should be able to recover based on the last committed offset. Try that and see how it goes.
> 
> Regards
> 
> -----Ursprüngliche Nachricht-----
> Von: M. Manna <ma...@gmail.com>
> Gesendet: Donnerstag, 7. November 2019 23:35
> An: users@kafka.apache.org
> Betreff: Re: Consumer Lags and receive no records anymore
> 
> Consuming not fast/frequent enough is one of the most common reasons for it. Have you you checked how fast/much message you’re churning out vs. how many consumers you have in the group the handle the workload?
> 
> Also, what are your partition setup for consumer groups?
> 
> 
> Regards,
> 
> On Thu, 7 Nov 2019 at 22:03, Oliver Eckle <ie...@gmx.de> wrote:
> 
>> Using  kafka-consumer-groups.sh --bootstrap-server localhost:9092 
>> --describe -group my-app ..
>> put the output within the logs .. also its pretty obvious, cause no 
>> data will flow anymore
>> 
>> Regards
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: M. Manna <ma...@gmail.com>
>> Gesendet: Donnerstag, 7. November 2019 22:10
>> An: users@kafka.apache.org
>> Betreff: Re: Consumer Lags and receive no records anymore
>> 
>> Have you checked your Kafka consumer group status ? How did you 
>> determine that your consumers are lagging ?
>> 
>> Thanks,
>> 
>> On Thu, 7 Nov 2019 at 20:55, Oliver Eckle <ie...@gmx.de> wrote:
>> 
>>> Hi there,
>>> 
>>> 
>>> 
>>> have pretty strange behaviour questioned here already:
>>> https://stackoverflow.com/q/58650416/7776688
>>> 
>>> 
>>> 
>>> As you could see from the logs: https://pastebin.com/yrSytSHD at a 
>>> specific point the client is stopping to receive records.
>>> 
>>> I have a strong suspicion that it relates to performance on handling 
>>> the records - so that I run into kind of timeout.
>>> 
>>> What seems to be strange, is that the client is not getting back and 
>>> heartbeats are processed successfully.
>>> 
>>> Even the consumer will be returned on inspecting the consumer group.
>>> Any idea .. kafka log has no error in it.
>>> 
>>> 
>>> 
>>> Running a cluster with 3 broker inside a Kubernetes cluster, using 
>>> the bitnami helm chart.
>>> 
>>> 
>>> 
>>> Kind Regards
>>> 
>>> Oliver
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 



Re: Consumer Lags and receive no records anymore

Posted by "M. Manna" <ma...@gmail.com>.
Hi,

> On 7 Nov 2019, at 22:39, Oliver Eckle <ie...@gmx.de> wrote:
> 
> Have a consumer group with one consumer for the topic .. by misunderstanding I have two partitions on the topic .. 
> Due to having no key set for the record - I think having several consumers making no sense, or am I wrong.
> 
I am not sure why that would be an issue. If you have 1 consumer your cgroup, yes all the topic partitions will be assigned to that consumer. Slow consumer means your consumers aren’t consuming messages as fast as you are producing (or, fast enough). 
> Is there any possibility to work around that? 
> Cause for example on lagging topic is put to a external REST service, which takes around 300ms to be handled.
What do you mean by “Lagging topic is put to an external REST service”? 
> So is lowering the max.poll.records an option?
Polling will keep blocking continuously until minimum bytes of records available. Also, it sends a heartbeat per call of poll().
> Anyhow, I could probably not avoid situations like that. Sounds to me like a pretty common scenario?
> So how to deal with them? Having a health check that crush the app if no data is appearing anymore?
In K8s world, you can tie this with liveness probe, if you consumers aren’t live and then you may chose to destroy the pod and bring them back up. Provided that your offset commits are adhering to how technical requirements, you should be able to recover based on the last committed offset. Try that and see how it goes.
> 
> Regards
> 
> -----Ursprüngliche Nachricht-----
> Von: M. Manna <ma...@gmail.com> 
> Gesendet: Donnerstag, 7. November 2019 23:35
> An: users@kafka.apache.org
> Betreff: Re: Consumer Lags and receive no records anymore
> 
> Consuming not fast/frequent enough is one of the most common reasons for it. Have you you checked how fast/much message you’re churning out vs. how many consumers you have in the group the handle the workload?
> 
> Also, what are your partition setup for consumer groups?
> 
> 
> Regards,
> 
> On Thu, 7 Nov 2019 at 22:03, Oliver Eckle <ie...@gmx.de> wrote:
> 
>> Using  kafka-consumer-groups.sh --bootstrap-server localhost:9092 
>> --describe -group my-app ..
>> put the output within the logs .. also its pretty obvious, cause no 
>> data will flow anymore
>> 
>> Regards
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: M. Manna <ma...@gmail.com>
>> Gesendet: Donnerstag, 7. November 2019 22:10
>> An: users@kafka.apache.org
>> Betreff: Re: Consumer Lags and receive no records anymore
>> 
>> Have you checked your Kafka consumer group status ? How did you 
>> determine that your consumers are lagging ?
>> 
>> Thanks,
>> 
>> On Thu, 7 Nov 2019 at 20:55, Oliver Eckle <ie...@gmx.de> wrote:
>> 
>>> Hi there,
>>> 
>>> 
>>> 
>>> have pretty strange behaviour questioned here already:
>>> https://stackoverflow.com/q/58650416/7776688
>>> 
>>> 
>>> 
>>> As you could see from the logs: https://pastebin.com/yrSytSHD at a 
>>> specific point the client is stopping to receive records.
>>> 
>>> I have a strong suspicion that it relates to performance on handling 
>>> the records - so that I run into kind of timeout.
>>> 
>>> What seems to be strange, is that the client is not getting back and 
>>> heartbeats are processed successfully.
>>> 
>>> Even the consumer will be returned on inspecting the consumer group.
>>> Any idea .. kafka log has no error in it.
>>> 
>>> 
>>> 
>>> Running a cluster with 3 broker inside a Kubernetes cluster, using 
>>> the bitnami helm chart.
>>> 
>>> 
>>> 
>>> Kind Regards
>>> 
>>> Oliver
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 


AW: Consumer Lags and receive no records anymore

Posted by Oliver Eckle <ie...@gmx.de>.
Have a consumer group with one consumer for the topic .. by misunderstanding I have two partitions on the topic .. 
Due to having no key set for the record - I think having several consumers making no sense, or am I wrong.

Is there any possibility to work around that? 
Cause for example on lagging topic is put to a external REST service, which takes around 300ms to be handled.
So is lowering the max.poll.records an option?
Anyhow, I could probably not avoid situations like that. Sounds to me like a pretty common scenario?
So how to deal with them? Having a health check that crush the app if no data is appearing anymore?

Regards

-----Ursprüngliche Nachricht-----
Von: M. Manna <ma...@gmail.com> 
Gesendet: Donnerstag, 7. November 2019 23:35
An: users@kafka.apache.org
Betreff: Re: Consumer Lags and receive no records anymore

Consuming not fast/frequent enough is one of the most common reasons for it. Have you you checked how fast/much message you’re churning out vs. how many consumers you have in the group the handle the workload?

Also, what are your partition setup for consumer groups?


Regards,

On Thu, 7 Nov 2019 at 22:03, Oliver Eckle <ie...@gmx.de> wrote:

> Using  kafka-consumer-groups.sh --bootstrap-server localhost:9092 
> --describe -group my-app ..
> put the output within the logs .. also its pretty obvious, cause no 
> data will flow anymore
>
> Regards
>
> -----Ursprüngliche Nachricht-----
> Von: M. Manna <ma...@gmail.com>
> Gesendet: Donnerstag, 7. November 2019 22:10
> An: users@kafka.apache.org
> Betreff: Re: Consumer Lags and receive no records anymore
>
> Have you checked your Kafka consumer group status ? How did you 
> determine that your consumers are lagging ?
>
> Thanks,
>
> On Thu, 7 Nov 2019 at 20:55, Oliver Eckle <ie...@gmx.de> wrote:
>
> > Hi there,
> >
> >
> >
> > have pretty strange behaviour questioned here already:
> > https://stackoverflow.com/q/58650416/7776688
> >
> >
> >
> > As you could see from the logs: https://pastebin.com/yrSytSHD at a 
> > specific point the client is stopping to receive records.
> >
> > I have a strong suspicion that it relates to performance on handling 
> > the records - so that I run into kind of timeout.
> >
> > What seems to be strange, is that the client is not getting back and 
> > heartbeats are processed successfully.
> >
> > Even the consumer will be returned on inspecting the consumer group.
> > Any idea .. kafka log has no error in it.
> >
> >
> >
> > Running a cluster with 3 broker inside a Kubernetes cluster, using 
> > the bitnami helm chart.
> >
> >
> >
> > Kind Regards
> >
> > Oliver
> >
> >
> >
> >
> >
> >
>
>


Re: Consumer Lags and receive no records anymore

Posted by "M. Manna" <ma...@gmail.com>.
Consuming not fast/frequent enough is one of the most common reasons for
it. Have you you checked how fast/much message you’re churning out vs. how
many consumers you have in the group the handle the workload?

Also, what are your partition setup for consumer groups?


Regards,

On Thu, 7 Nov 2019 at 22:03, Oliver Eckle <ie...@gmx.de> wrote:

> Using  kafka-consumer-groups.sh --bootstrap-server localhost:9092
> --describe -group my-app ..
> put the output within the logs .. also its pretty obvious, cause no data
> will flow anymore
>
> Regards
>
> -----Ursprüngliche Nachricht-----
> Von: M. Manna <ma...@gmail.com>
> Gesendet: Donnerstag, 7. November 2019 22:10
> An: users@kafka.apache.org
> Betreff: Re: Consumer Lags and receive no records anymore
>
> Have you checked your Kafka consumer group status ? How did you determine
> that your consumers are lagging ?
>
> Thanks,
>
> On Thu, 7 Nov 2019 at 20:55, Oliver Eckle <ie...@gmx.de> wrote:
>
> > Hi there,
> >
> >
> >
> > have pretty strange behaviour questioned here already:
> > https://stackoverflow.com/q/58650416/7776688
> >
> >
> >
> > As you could see from the logs: https://pastebin.com/yrSytSHD at a
> > specific point the client is stopping to receive records.
> >
> > I have a strong suspicion that it relates to performance on handling
> > the records - so that I run into kind of timeout.
> >
> > What seems to be strange, is that the client is not getting back and
> > heartbeats are processed successfully.
> >
> > Even the consumer will be returned on inspecting the consumer group.
> > Any idea .. kafka log has no error in it.
> >
> >
> >
> > Running a cluster with 3 broker inside a Kubernetes cluster, using the
> > bitnami helm chart.
> >
> >
> >
> > Kind Regards
> >
> > Oliver
> >
> >
> >
> >
> >
> >
>
>

AW: Consumer Lags and receive no records anymore

Posted by Oliver Eckle <ie...@gmx.de>.
Using  kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe -group my-app .. 
put the output within the logs .. also its pretty obvious, cause no data will flow anymore 

Regards

-----Ursprüngliche Nachricht-----
Von: M. Manna <ma...@gmail.com> 
Gesendet: Donnerstag, 7. November 2019 22:10
An: users@kafka.apache.org
Betreff: Re: Consumer Lags and receive no records anymore

Have you checked your Kafka consumer group status ? How did you determine that your consumers are lagging ?

Thanks,

On Thu, 7 Nov 2019 at 20:55, Oliver Eckle <ie...@gmx.de> wrote:

> Hi there,
>
>
>
> have pretty strange behaviour questioned here already:
> https://stackoverflow.com/q/58650416/7776688
>
>
>
> As you could see from the logs: https://pastebin.com/yrSytSHD at a 
> specific point the client is stopping to receive records.
>
> I have a strong suspicion that it relates to performance on handling 
> the records - so that I run into kind of timeout.
>
> What seems to be strange, is that the client is not getting back and 
> heartbeats are processed successfully.
>
> Even the consumer will be returned on inspecting the consumer group. 
> Any idea .. kafka log has no error in it.
>
>
>
> Running a cluster with 3 broker inside a Kubernetes cluster, using the 
> bitnami helm chart.
>
>
>
> Kind Regards
>
> Oliver
>
>
>
>
>
>


Re: Consumer Lags and receive no records anymore

Posted by "M. Manna" <ma...@gmail.com>.
Have you checked your Kafka consumer group status ? How did you determine
that your consumers are lagging ?

Thanks,

On Thu, 7 Nov 2019 at 20:55, Oliver Eckle <ie...@gmx.de> wrote:

> Hi there,
>
>
>
> have pretty strange behaviour questioned here already:
> https://stackoverflow.com/q/58650416/7776688
>
>
>
> As you could see from the logs: https://pastebin.com/yrSytSHD at a
> specific
> point the client is stopping to receive records.
>
> I have a strong suspicion that it relates to performance on handling the
> records - so that I run into kind of timeout.
>
> What seems to be strange, is that the client is not getting back and
> heartbeats are processed successfully.
>
> Even the consumer will be returned on inspecting the consumer group. Any
> idea .. kafka log has no error in it.
>
>
>
> Running a cluster with 3 broker inside a Kubernetes cluster, using the
> bitnami helm chart.
>
>
>
> Kind Regards
>
> Oliver
>
>
>
>
>
>