You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Shantanu Deshmukh <sh...@gmail.com> on 2018/05/23 10:18:02 UTC

Frequent consumer rebalance, auto commit failures

 We have a 3 broker Kafka 0.10.0.1 cluster. There we have 3 topics with 10
partitions each. We have an application which spawns threads as consumers.
We spawn 5 consumers for each topic. I am observing that consider group
randomly keeps rebalancing. Then many times we see logs saying "Revoking
partitions for". This happens almost every 10 minutes. Consumption during
this time completely stops.

I have applied this configuration
max.poll.records 20
heartbeat.interval.ms 10000
Session.timeout.ms 6000

Still this did not help. Strange thing is I observed consumer writing logs
saying "auto commit failed because poll() loop spent too much time
processing records" even when there was no data in partition to process. We
have polling interval of 500 ms, specified as argument in poll(). Initially
I had set same consumer group for all three topics' consumers. Then I
specified different CGs for different topics' consumers. Even this is not
helping.

I am trying to search over the web, checked my code, tried many
combinations of configuration but still no luck. Please help me.

*Thanks & Regards,*

*Shantanu Deshmukh*

Re: Frequent consumer rebalance, auto commit failures

Posted by amit pal <am...@gmail.com>.
Hi Shantanu,

If you are using kafka stream, upgrade to the latest jar. There are a bunch
of fixes in the way it uses kafka consumers.

Apart from this: try these settings
1. Set the session.timeout.ms value higher, to something like 300000
2. Set the heartbeat.interval.ms to lower value, something like 2000.
3. Set the max.poll.interval.ms to some reasonable value.

if your processing takes time, you can reduce max.poll.records down to 1.



On Thu, May 24, 2018 at 9:27 PM Shantanu Deshmukh <sh...@gmail.com>
wrote:

> Hey Vincent.
> That's exactly how my code is. I am doing processing within that for loop.
>
> In KIP-62 I read that heartbeat happens via a separate thread
> https://github.com/dpkp/kafka-python/issues/948. But you are saying it
> happens through polling. What can be considered true?  I have set
> session.timeout.ms to 5 minutes. max.poll.records is set to 5. So even if
> my message takes 30 seconds to process, it still shouldn't cross this
> threshold. Yet I see frequent rebalances. Then there is
> max.poll.interval.ms
> too. Don't exactly know how it affects. But overall I am finding it very
> difficult to understand these myriads of settings, also documentation is
> not very clear.
>
> On Thu, May 24, 2018 at 8:09 PM Vincent Maurin <vi...@glispa.com>
> wrote:
>
> > Shantanu, I was more referering to you application code.
> > You should have something similar to :
> >
> > while (true) {
> >     ConsumerRecords<String, String> records = consumer.poll(100);
> >     for (ConsumerRecord<String, String> record : records) {
> >           // Your logic
> >     }
> > }
> >
> > You should make sure that the code within the loop doesn't take too much
> > time (more than session.timeout.ms)
> > From the consumer javadoc
> > "The consumer will automatically ping the cluster periodically, which
> lets
> > the cluster know that it is alive. Note that the consumer is
> > single-threaded, so periodic heartbeats can only be sent when poll(long)
> > <
> >
> https://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#poll(long)
> > >
> > is called. As long as the consumer is able to do this it is considered
> > alive and retains the right to consume from the partitions assigned to
> it.
> > If it stops heartbeating by failing to call poll(long)
> > <
> >
> https://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#poll(long)
> > >
> > for a period of time longer than session.timeout.ms then it will be
> > considered dead and its partitions will be assigned to another process."
> >
> > Best
> >
> > On Thu, May 24, 2018 at 4:07 PM Shantanu Deshmukh <shantanu88d@gmail.com
> >
> > wrote:
> >
> > > Another observation is that when I restart my application. Consumption
> > > doesn't start till 5-6 minutes. In kafka consumer logs I see
> > >
> > > ConsumerCoordinator.333 - Revoking previously assigned partitions []
> for
> > > group notifications-consumer
> > > AbstractCoordinator:381 - (Re-)joining group notifications-consumer
> > >
> > > Then nothing. After 5-6 minutes activities start.
> > >
> > > On Thu, May 24, 2018 at 6:49 PM Shantanu Deshmukh <
> shantanu88d@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Vincent,
> > > >
> > > > Yes I reduced max.poll.records to get that same effect. I reduced it
> > all
> > > > the way down to 5 records still I am seeing same error. What else can
> > be
> > > > done? For one topic I can see that a single message processing is
> > taking
> > > > about 20 seconds. So 5 of them will take 1 minute. So I set
> > > > session.timeout.ms to 5 minutes, max.poll.interval.ms to 10 minutes.
> > But
> > > > it is not helping still.
> > > >
> > > > On Thu, May 24, 2018 at 6:15 PM Vincent Maurin <
> > > vincent.maurin@glispa.com>
> > > > wrote:
> > > >
> > > >> Hello Shantanu,
> > > >>
> > > >> It is also important to consider your consumer code. You should not
> > > spend
> > > >> to much time in between two calls to "poll" method. Otherwise, the
> > > >> consumer
> > > >> not calling poll will be considered dead by the group, triggering a
> > > >> rebalancing.
> > > >>
> > > >> Best
> > > >>
> > > >> On Thu, May 24, 2018 at 1:45 PM M. Manna <ma...@gmail.com>
> wrote:
> > > >>
> > > >> > Set your rebalance.backoff.ms=10000 and
> > zookeeper.session.timeout.ms
> > > >> =30000
> > > >> > in addition to what Manikumar said.
> > > >> >
> > > >> >
> > > >> >
> > > >> > On 24 May 2018 at 12:41, Shantanu Deshmukh <shantanu88d@gmail.com
> >
> > > >> wrote:
> > > >> >
> > > >> > > Hello,
> > > >> > >
> > > >> > > There was a type in my first mail. session.timeout.ms is
> actually
> > > >> 60000
> > > >> > > not
> > > >> > > 6000. So it is less than heartbeat.interval.ms.
> > > >> > >
> > > >> > > On Thu, May 24, 2018 at 2:46 PM Manikumar <
> > > manikumar.reddy@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > heartbeat.interval.ms should be lower than session.timeout.ms
> .
> > > >> > > >
> > > >> > > > Check here:
> > > >> > > >
> > > http://kafka.apache.org/0101/documentation.html#newconsumerconfigs
> > > >> > > >
> > > >> > > >
> > > >> > > > On Thu, May 24, 2018 at 2:39 PM, Shantanu Deshmukh <
> > > >> > > shantanu88d@gmail.com>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Someone please help me. I am suffering due to this issue
> > since a
> > > >> long
> > > >> > > > time
> > > >> > > > > and not finding any solution.
> > > >> > > > >
> > > >> > > > > On Wed, May 23, 2018 at 3:48 PM Shantanu Deshmukh <
> > > >> > > shantanu88d@gmail.com
> > > >> > > > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > We have a 3 broker Kafka 0.10.0.1 cluster. There we have 3
> > > >> topics
> > > >> > > with
> > > >> > > > 10
> > > >> > > > > > partitions each. We have an application which spawns
> threads
> > > as
> > > >> > > > > consumers.
> > > >> > > > > > We spawn 5 consumers for each topic. I am observing that
> > > >> consider
> > > >> > > group
> > > >> > > > > > randomly keeps rebalancing. Then many times we see logs
> > saying
> > > >> > > > "Revoking
> > > >> > > > > > partitions for". This happens almost every 10 minutes.
> > > >> Consumption
> > > >> > > > during
> > > >> > > > > > this time completely stops.
> > > >> > > > > >
> > > >> > > > > > I have applied this configuration
> > > >> > > > > > max.poll.records 20
> > > >> > > > > > heartbeat.interval.ms 10000
> > > >> > > > > > Session.timeout.ms 6000
> > > >> > > > > >
> > > >> > > > > > Still this did not help. Strange thing is I observed
> > consumer
> > > >> > writing
> > > >> > > > > logs
> > > >> > > > > > saying "auto commit failed because poll() loop spent too
> > much
> > > >> time
> > > >> > > > > > processing records" even when there was no data in
> partition
> > > to
> > > >> > > > process.
> > > >> > > > > We
> > > >> > > > > > have polling interval of 500 ms, specified as argument in
> > > >> poll().
> > > >> > > > > Initially
> > > >> > > > > > I had set same consumer group for all three topics'
> > consumers.
> > > >> > Then I
> > > >> > > > > > specified different CGs for different topics' consumers.
> > Even
> > > >> this
> > > >> > is
> > > >> > > > not
> > > >> > > > > > helping.
> > > >> > > > > >
> > > >> > > > > > I am trying to search over the web, checked my code, tried
> > > many
> > > >> > > > > > combinations of configuration but still no luck. Please
> help
> > > me.
> > > >> > > > > >
> > > >> > > > > > *Thanks & Regards,*
> > > >> > > > > >
> > > >> > > > > > *Shantanu Deshmukh*
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: Frequent consumer rebalance, auto commit failures

Posted by Shantanu Deshmukh <sh...@gmail.com>.
Hey Vincent.
That's exactly how my code is. I am doing processing within that for loop.

In KIP-62 I read that heartbeat happens via a separate thread
https://github.com/dpkp/kafka-python/issues/948. But you are saying it
happens through polling. What can be considered true?  I have set
session.timeout.ms to 5 minutes. max.poll.records is set to 5. So even if
my message takes 30 seconds to process, it still shouldn't cross this
threshold. Yet I see frequent rebalances. Then there is max.poll.interval.ms
too. Don't exactly know how it affects. But overall I am finding it very
difficult to understand these myriads of settings, also documentation is
not very clear.

On Thu, May 24, 2018 at 8:09 PM Vincent Maurin <vi...@glispa.com>
wrote:

> Shantanu, I was more referering to you application code.
> You should have something similar to :
>
> while (true) {
>     ConsumerRecords<String, String> records = consumer.poll(100);
>     for (ConsumerRecord<String, String> record : records) {
>           // Your logic
>     }
> }
>
> You should make sure that the code within the loop doesn't take too much
> time (more than session.timeout.ms)
> From the consumer javadoc
> "The consumer will automatically ping the cluster periodically, which lets
> the cluster know that it is alive. Note that the consumer is
> single-threaded, so periodic heartbeats can only be sent when poll(long)
> <
> https://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#poll(long)
> >
> is called. As long as the consumer is able to do this it is considered
> alive and retains the right to consume from the partitions assigned to it.
> If it stops heartbeating by failing to call poll(long)
> <
> https://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#poll(long)
> >
> for a period of time longer than session.timeout.ms then it will be
> considered dead and its partitions will be assigned to another process."
>
> Best
>
> On Thu, May 24, 2018 at 4:07 PM Shantanu Deshmukh <sh...@gmail.com>
> wrote:
>
> > Another observation is that when I restart my application. Consumption
> > doesn't start till 5-6 minutes. In kafka consumer logs I see
> >
> > ConsumerCoordinator.333 - Revoking previously assigned partitions [] for
> > group notifications-consumer
> > AbstractCoordinator:381 - (Re-)joining group notifications-consumer
> >
> > Then nothing. After 5-6 minutes activities start.
> >
> > On Thu, May 24, 2018 at 6:49 PM Shantanu Deshmukh <shantanu88d@gmail.com
> >
> > wrote:
> >
> > > Hi Vincent,
> > >
> > > Yes I reduced max.poll.records to get that same effect. I reduced it
> all
> > > the way down to 5 records still I am seeing same error. What else can
> be
> > > done? For one topic I can see that a single message processing is
> taking
> > > about 20 seconds. So 5 of them will take 1 minute. So I set
> > > session.timeout.ms to 5 minutes, max.poll.interval.ms to 10 minutes.
> But
> > > it is not helping still.
> > >
> > > On Thu, May 24, 2018 at 6:15 PM Vincent Maurin <
> > vincent.maurin@glispa.com>
> > > wrote:
> > >
> > >> Hello Shantanu,
> > >>
> > >> It is also important to consider your consumer code. You should not
> > spend
> > >> to much time in between two calls to "poll" method. Otherwise, the
> > >> consumer
> > >> not calling poll will be considered dead by the group, triggering a
> > >> rebalancing.
> > >>
> > >> Best
> > >>
> > >> On Thu, May 24, 2018 at 1:45 PM M. Manna <ma...@gmail.com> wrote:
> > >>
> > >> > Set your rebalance.backoff.ms=10000 and
> zookeeper.session.timeout.ms
> > >> =30000
> > >> > in addition to what Manikumar said.
> > >> >
> > >> >
> > >> >
> > >> > On 24 May 2018 at 12:41, Shantanu Deshmukh <sh...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Hello,
> > >> > >
> > >> > > There was a type in my first mail. session.timeout.ms is actually
> > >> 60000
> > >> > > not
> > >> > > 6000. So it is less than heartbeat.interval.ms.
> > >> > >
> > >> > > On Thu, May 24, 2018 at 2:46 PM Manikumar <
> > manikumar.reddy@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > heartbeat.interval.ms should be lower than session.timeout.ms.
> > >> > > >
> > >> > > > Check here:
> > >> > > >
> > http://kafka.apache.org/0101/documentation.html#newconsumerconfigs
> > >> > > >
> > >> > > >
> > >> > > > On Thu, May 24, 2018 at 2:39 PM, Shantanu Deshmukh <
> > >> > > shantanu88d@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Someone please help me. I am suffering due to this issue
> since a
> > >> long
> > >> > > > time
> > >> > > > > and not finding any solution.
> > >> > > > >
> > >> > > > > On Wed, May 23, 2018 at 3:48 PM Shantanu Deshmukh <
> > >> > > shantanu88d@gmail.com
> > >> > > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > We have a 3 broker Kafka 0.10.0.1 cluster. There we have 3
> > >> topics
> > >> > > with
> > >> > > > 10
> > >> > > > > > partitions each. We have an application which spawns threads
> > as
> > >> > > > > consumers.
> > >> > > > > > We spawn 5 consumers for each topic. I am observing that
> > >> consider
> > >> > > group
> > >> > > > > > randomly keeps rebalancing. Then many times we see logs
> saying
> > >> > > > "Revoking
> > >> > > > > > partitions for". This happens almost every 10 minutes.
> > >> Consumption
> > >> > > > during
> > >> > > > > > this time completely stops.
> > >> > > > > >
> > >> > > > > > I have applied this configuration
> > >> > > > > > max.poll.records 20
> > >> > > > > > heartbeat.interval.ms 10000
> > >> > > > > > Session.timeout.ms 6000
> > >> > > > > >
> > >> > > > > > Still this did not help. Strange thing is I observed
> consumer
> > >> > writing
> > >> > > > > logs
> > >> > > > > > saying "auto commit failed because poll() loop spent too
> much
> > >> time
> > >> > > > > > processing records" even when there was no data in partition
> > to
> > >> > > > process.
> > >> > > > > We
> > >> > > > > > have polling interval of 500 ms, specified as argument in
> > >> poll().
> > >> > > > > Initially
> > >> > > > > > I had set same consumer group for all three topics'
> consumers.
> > >> > Then I
> > >> > > > > > specified different CGs for different topics' consumers.
> Even
> > >> this
> > >> > is
> > >> > > > not
> > >> > > > > > helping.
> > >> > > > > >
> > >> > > > > > I am trying to search over the web, checked my code, tried
> > many
> > >> > > > > > combinations of configuration but still no luck. Please help
> > me.
> > >> > > > > >
> > >> > > > > > *Thanks & Regards,*
> > >> > > > > >
> > >> > > > > > *Shantanu Deshmukh*
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: Frequent consumer rebalance, auto commit failures

Posted by Vincent Maurin <vi...@glispa.com>.
Shantanu, I was more referering to you application code.
You should have something similar to :

while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    for (ConsumerRecord<String, String> record : records) {
          // Your logic
    }
}

You should make sure that the code within the loop doesn't take too much
time (more than session.timeout.ms)
From the consumer javadoc
"The consumer will automatically ping the cluster periodically, which lets
the cluster know that it is alive. Note that the consumer is
single-threaded, so periodic heartbeats can only be sent when poll(long)
<https://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#poll(long)>
is called. As long as the consumer is able to do this it is considered
alive and retains the right to consume from the partitions assigned to it.
If it stops heartbeating by failing to call poll(long)
<https://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#poll(long)>
for a period of time longer than session.timeout.ms then it will be
considered dead and its partitions will be assigned to another process."

Best

On Thu, May 24, 2018 at 4:07 PM Shantanu Deshmukh <sh...@gmail.com>
wrote:

> Another observation is that when I restart my application. Consumption
> doesn't start till 5-6 minutes. In kafka consumer logs I see
>
> ConsumerCoordinator.333 - Revoking previously assigned partitions [] for
> group notifications-consumer
> AbstractCoordinator:381 - (Re-)joining group notifications-consumer
>
> Then nothing. After 5-6 minutes activities start.
>
> On Thu, May 24, 2018 at 6:49 PM Shantanu Deshmukh <sh...@gmail.com>
> wrote:
>
> > Hi Vincent,
> >
> > Yes I reduced max.poll.records to get that same effect. I reduced it all
> > the way down to 5 records still I am seeing same error. What else can be
> > done? For one topic I can see that a single message processing is taking
> > about 20 seconds. So 5 of them will take 1 minute. So I set
> > session.timeout.ms to 5 minutes, max.poll.interval.ms to 10 minutes. But
> > it is not helping still.
> >
> > On Thu, May 24, 2018 at 6:15 PM Vincent Maurin <
> vincent.maurin@glispa.com>
> > wrote:
> >
> >> Hello Shantanu,
> >>
> >> It is also important to consider your consumer code. You should not
> spend
> >> to much time in between two calls to "poll" method. Otherwise, the
> >> consumer
> >> not calling poll will be considered dead by the group, triggering a
> >> rebalancing.
> >>
> >> Best
> >>
> >> On Thu, May 24, 2018 at 1:45 PM M. Manna <ma...@gmail.com> wrote:
> >>
> >> > Set your rebalance.backoff.ms=10000 and zookeeper.session.timeout.ms
> >> =30000
> >> > in addition to what Manikumar said.
> >> >
> >> >
> >> >
> >> > On 24 May 2018 at 12:41, Shantanu Deshmukh <sh...@gmail.com>
> >> wrote:
> >> >
> >> > > Hello,
> >> > >
> >> > > There was a type in my first mail. session.timeout.ms is actually
> >> 60000
> >> > > not
> >> > > 6000. So it is less than heartbeat.interval.ms.
> >> > >
> >> > > On Thu, May 24, 2018 at 2:46 PM Manikumar <
> manikumar.reddy@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > heartbeat.interval.ms should be lower than session.timeout.ms.
> >> > > >
> >> > > > Check here:
> >> > > >
> http://kafka.apache.org/0101/documentation.html#newconsumerconfigs
> >> > > >
> >> > > >
> >> > > > On Thu, May 24, 2018 at 2:39 PM, Shantanu Deshmukh <
> >> > > shantanu88d@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Someone please help me. I am suffering due to this issue since a
> >> long
> >> > > > time
> >> > > > > and not finding any solution.
> >> > > > >
> >> > > > > On Wed, May 23, 2018 at 3:48 PM Shantanu Deshmukh <
> >> > > shantanu88d@gmail.com
> >> > > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > We have a 3 broker Kafka 0.10.0.1 cluster. There we have 3
> >> topics
> >> > > with
> >> > > > 10
> >> > > > > > partitions each. We have an application which spawns threads
> as
> >> > > > > consumers.
> >> > > > > > We spawn 5 consumers for each topic. I am observing that
> >> consider
> >> > > group
> >> > > > > > randomly keeps rebalancing. Then many times we see logs saying
> >> > > > "Revoking
> >> > > > > > partitions for". This happens almost every 10 minutes.
> >> Consumption
> >> > > > during
> >> > > > > > this time completely stops.
> >> > > > > >
> >> > > > > > I have applied this configuration
> >> > > > > > max.poll.records 20
> >> > > > > > heartbeat.interval.ms 10000
> >> > > > > > Session.timeout.ms 6000
> >> > > > > >
> >> > > > > > Still this did not help. Strange thing is I observed consumer
> >> > writing
> >> > > > > logs
> >> > > > > > saying "auto commit failed because poll() loop spent too much
> >> time
> >> > > > > > processing records" even when there was no data in partition
> to
> >> > > > process.
> >> > > > > We
> >> > > > > > have polling interval of 500 ms, specified as argument in
> >> poll().
> >> > > > > Initially
> >> > > > > > I had set same consumer group for all three topics' consumers.
> >> > Then I
> >> > > > > > specified different CGs for different topics' consumers. Even
> >> this
> >> > is
> >> > > > not
> >> > > > > > helping.
> >> > > > > >
> >> > > > > > I am trying to search over the web, checked my code, tried
> many
> >> > > > > > combinations of configuration but still no luck. Please help
> me.
> >> > > > > >
> >> > > > > > *Thanks & Regards,*
> >> > > > > >
> >> > > > > > *Shantanu Deshmukh*
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: Frequent consumer rebalance, auto commit failures

Posted by Shantanu Deshmukh <sh...@gmail.com>.
Another observation is that when I restart my application. Consumption
doesn't start till 5-6 minutes. In kafka consumer logs I see

ConsumerCoordinator.333 - Revoking previously assigned partitions [] for
group notifications-consumer
AbstractCoordinator:381 - (Re-)joining group notifications-consumer

Then nothing. After 5-6 minutes activities start.

On Thu, May 24, 2018 at 6:49 PM Shantanu Deshmukh <sh...@gmail.com>
wrote:

> Hi Vincent,
>
> Yes I reduced max.poll.records to get that same effect. I reduced it all
> the way down to 5 records still I am seeing same error. What else can be
> done? For one topic I can see that a single message processing is taking
> about 20 seconds. So 5 of them will take 1 minute. So I set
> session.timeout.ms to 5 minutes, max.poll.interval.ms to 10 minutes. But
> it is not helping still.
>
> On Thu, May 24, 2018 at 6:15 PM Vincent Maurin <vi...@glispa.com>
> wrote:
>
>> Hello Shantanu,
>>
>> It is also important to consider your consumer code. You should not spend
>> to much time in between two calls to "poll" method. Otherwise, the
>> consumer
>> not calling poll will be considered dead by the group, triggering a
>> rebalancing.
>>
>> Best
>>
>> On Thu, May 24, 2018 at 1:45 PM M. Manna <ma...@gmail.com> wrote:
>>
>> > Set your rebalance.backoff.ms=10000 and zookeeper.session.timeout.ms
>> =30000
>> > in addition to what Manikumar said.
>> >
>> >
>> >
>> > On 24 May 2018 at 12:41, Shantanu Deshmukh <sh...@gmail.com>
>> wrote:
>> >
>> > > Hello,
>> > >
>> > > There was a type in my first mail. session.timeout.ms is actually
>> 60000
>> > > not
>> > > 6000. So it is less than heartbeat.interval.ms.
>> > >
>> > > On Thu, May 24, 2018 at 2:46 PM Manikumar <ma...@gmail.com>
>> > > wrote:
>> > >
>> > > > heartbeat.interval.ms should be lower than session.timeout.ms.
>> > > >
>> > > > Check here:
>> > > > http://kafka.apache.org/0101/documentation.html#newconsumerconfigs
>> > > >
>> > > >
>> > > > On Thu, May 24, 2018 at 2:39 PM, Shantanu Deshmukh <
>> > > shantanu88d@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Someone please help me. I am suffering due to this issue since a
>> long
>> > > > time
>> > > > > and not finding any solution.
>> > > > >
>> > > > > On Wed, May 23, 2018 at 3:48 PM Shantanu Deshmukh <
>> > > shantanu88d@gmail.com
>> > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > We have a 3 broker Kafka 0.10.0.1 cluster. There we have 3
>> topics
>> > > with
>> > > > 10
>> > > > > > partitions each. We have an application which spawns threads as
>> > > > > consumers.
>> > > > > > We spawn 5 consumers for each topic. I am observing that
>> consider
>> > > group
>> > > > > > randomly keeps rebalancing. Then many times we see logs saying
>> > > > "Revoking
>> > > > > > partitions for". This happens almost every 10 minutes.
>> Consumption
>> > > > during
>> > > > > > this time completely stops.
>> > > > > >
>> > > > > > I have applied this configuration
>> > > > > > max.poll.records 20
>> > > > > > heartbeat.interval.ms 10000
>> > > > > > Session.timeout.ms 6000
>> > > > > >
>> > > > > > Still this did not help. Strange thing is I observed consumer
>> > writing
>> > > > > logs
>> > > > > > saying "auto commit failed because poll() loop spent too much
>> time
>> > > > > > processing records" even when there was no data in partition to
>> > > > process.
>> > > > > We
>> > > > > > have polling interval of 500 ms, specified as argument in
>> poll().
>> > > > > Initially
>> > > > > > I had set same consumer group for all three topics' consumers.
>> > Then I
>> > > > > > specified different CGs for different topics' consumers. Even
>> this
>> > is
>> > > > not
>> > > > > > helping.
>> > > > > >
>> > > > > > I am trying to search over the web, checked my code, tried many
>> > > > > > combinations of configuration but still no luck. Please help me.
>> > > > > >
>> > > > > > *Thanks & Regards,*
>> > > > > >
>> > > > > > *Shantanu Deshmukh*
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Frequent consumer rebalance, auto commit failures

Posted by Shantanu Deshmukh <sh...@gmail.com>.
Hi Vincent,

Yes I reduced max.poll.records to get that same effect. I reduced it all
the way down to 5 records still I am seeing same error. What else can be
done? For one topic I can see that a single message processing is taking
about 20 seconds. So 5 of them will take 1 minute. So I set
session.timeout.ms to 5 minutes, max.poll.interval.ms to 10 minutes. But it
is not helping still.

On Thu, May 24, 2018 at 6:15 PM Vincent Maurin <vi...@glispa.com>
wrote:

> Hello Shantanu,
>
> It is also important to consider your consumer code. You should not spend
> to much time in between two calls to "poll" method. Otherwise, the consumer
> not calling poll will be considered dead by the group, triggering a
> rebalancing.
>
> Best
>
> On Thu, May 24, 2018 at 1:45 PM M. Manna <ma...@gmail.com> wrote:
>
> > Set your rebalance.backoff.ms=10000 and zookeeper.session.timeout.ms
> =30000
> > in addition to what Manikumar said.
> >
> >
> >
> > On 24 May 2018 at 12:41, Shantanu Deshmukh <sh...@gmail.com>
> wrote:
> >
> > > Hello,
> > >
> > > There was a type in my first mail. session.timeout.ms is actually
> 60000
> > > not
> > > 6000. So it is less than heartbeat.interval.ms.
> > >
> > > On Thu, May 24, 2018 at 2:46 PM Manikumar <ma...@gmail.com>
> > > wrote:
> > >
> > > > heartbeat.interval.ms should be lower than session.timeout.ms.
> > > >
> > > > Check here:
> > > > http://kafka.apache.org/0101/documentation.html#newconsumerconfigs
> > > >
> > > >
> > > > On Thu, May 24, 2018 at 2:39 PM, Shantanu Deshmukh <
> > > shantanu88d@gmail.com>
> > > > wrote:
> > > >
> > > > > Someone please help me. I am suffering due to this issue since a
> long
> > > > time
> > > > > and not finding any solution.
> > > > >
> > > > > On Wed, May 23, 2018 at 3:48 PM Shantanu Deshmukh <
> > > shantanu88d@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > We have a 3 broker Kafka 0.10.0.1 cluster. There we have 3 topics
> > > with
> > > > 10
> > > > > > partitions each. We have an application which spawns threads as
> > > > > consumers.
> > > > > > We spawn 5 consumers for each topic. I am observing that consider
> > > group
> > > > > > randomly keeps rebalancing. Then many times we see logs saying
> > > > "Revoking
> > > > > > partitions for". This happens almost every 10 minutes.
> Consumption
> > > > during
> > > > > > this time completely stops.
> > > > > >
> > > > > > I have applied this configuration
> > > > > > max.poll.records 20
> > > > > > heartbeat.interval.ms 10000
> > > > > > Session.timeout.ms 6000
> > > > > >
> > > > > > Still this did not help. Strange thing is I observed consumer
> > writing
> > > > > logs
> > > > > > saying "auto commit failed because poll() loop spent too much
> time
> > > > > > processing records" even when there was no data in partition to
> > > > process.
> > > > > We
> > > > > > have polling interval of 500 ms, specified as argument in poll().
> > > > > Initially
> > > > > > I had set same consumer group for all three topics' consumers.
> > Then I
> > > > > > specified different CGs for different topics' consumers. Even
> this
> > is
> > > > not
> > > > > > helping.
> > > > > >
> > > > > > I am trying to search over the web, checked my code, tried many
> > > > > > combinations of configuration but still no luck. Please help me.
> > > > > >
> > > > > > *Thanks & Regards,*
> > > > > >
> > > > > > *Shantanu Deshmukh*
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Frequent consumer rebalance, auto commit failures

Posted by Vincent Maurin <vi...@glispa.com>.
Hello Shantanu,

It is also important to consider your consumer code. You should not spend
to much time in between two calls to "poll" method. Otherwise, the consumer
not calling poll will be considered dead by the group, triggering a
rebalancing.

Best

On Thu, May 24, 2018 at 1:45 PM M. Manna <ma...@gmail.com> wrote:

> Set your rebalance.backoff.ms=10000 and zookeeper.session.timeout.ms=30000
> in addition to what Manikumar said.
>
>
>
> On 24 May 2018 at 12:41, Shantanu Deshmukh <sh...@gmail.com> wrote:
>
> > Hello,
> >
> > There was a type in my first mail. session.timeout.ms is actually 60000
> > not
> > 6000. So it is less than heartbeat.interval.ms.
> >
> > On Thu, May 24, 2018 at 2:46 PM Manikumar <ma...@gmail.com>
> > wrote:
> >
> > > heartbeat.interval.ms should be lower than session.timeout.ms.
> > >
> > > Check here:
> > > http://kafka.apache.org/0101/documentation.html#newconsumerconfigs
> > >
> > >
> > > On Thu, May 24, 2018 at 2:39 PM, Shantanu Deshmukh <
> > shantanu88d@gmail.com>
> > > wrote:
> > >
> > > > Someone please help me. I am suffering due to this issue since a long
> > > time
> > > > and not finding any solution.
> > > >
> > > > On Wed, May 23, 2018 at 3:48 PM Shantanu Deshmukh <
> > shantanu88d@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > We have a 3 broker Kafka 0.10.0.1 cluster. There we have 3 topics
> > with
> > > 10
> > > > > partitions each. We have an application which spawns threads as
> > > > consumers.
> > > > > We spawn 5 consumers for each topic. I am observing that consider
> > group
> > > > > randomly keeps rebalancing. Then many times we see logs saying
> > > "Revoking
> > > > > partitions for". This happens almost every 10 minutes. Consumption
> > > during
> > > > > this time completely stops.
> > > > >
> > > > > I have applied this configuration
> > > > > max.poll.records 20
> > > > > heartbeat.interval.ms 10000
> > > > > Session.timeout.ms 6000
> > > > >
> > > > > Still this did not help. Strange thing is I observed consumer
> writing
> > > > logs
> > > > > saying "auto commit failed because poll() loop spent too much time
> > > > > processing records" even when there was no data in partition to
> > > process.
> > > > We
> > > > > have polling interval of 500 ms, specified as argument in poll().
> > > > Initially
> > > > > I had set same consumer group for all three topics' consumers.
> Then I
> > > > > specified different CGs for different topics' consumers. Even this
> is
> > > not
> > > > > helping.
> > > > >
> > > > > I am trying to search over the web, checked my code, tried many
> > > > > combinations of configuration but still no luck. Please help me.
> > > > >
> > > > > *Thanks & Regards,*
> > > > >
> > > > > *Shantanu Deshmukh*
> > > > >
> > > >
> > >
> >
>

Re: Frequent consumer rebalance, auto commit failures

Posted by Shantanu Deshmukh <sh...@gmail.com>.
Hi M. Manna,

Thanks I will try these settings.

On Thu, May 24, 2018 at 5:15 PM M. Manna <ma...@gmail.com> wrote:

> Set your rebalance.backoff.ms=10000 and zookeeper.session.timeout.ms=30000
> in addition to what Manikumar said.
>
>
>
> On 24 May 2018 at 12:41, Shantanu Deshmukh <sh...@gmail.com> wrote:
>
> > Hello,
> >
> > There was a type in my first mail. session.timeout.ms is actually 60000
> > not
> > 6000. So it is less than heartbeat.interval.ms.
> >
> > On Thu, May 24, 2018 at 2:46 PM Manikumar <ma...@gmail.com>
> > wrote:
> >
> > > heartbeat.interval.ms should be lower than session.timeout.ms.
> > >
> > > Check here:
> > > http://kafka.apache.org/0101/documentation.html#newconsumerconfigs
> > >
> > >
> > > On Thu, May 24, 2018 at 2:39 PM, Shantanu Deshmukh <
> > shantanu88d@gmail.com>
> > > wrote:
> > >
> > > > Someone please help me. I am suffering due to this issue since a long
> > > time
> > > > and not finding any solution.
> > > >
> > > > On Wed, May 23, 2018 at 3:48 PM Shantanu Deshmukh <
> > shantanu88d@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > We have a 3 broker Kafka 0.10.0.1 cluster. There we have 3 topics
> > with
> > > 10
> > > > > partitions each. We have an application which spawns threads as
> > > > consumers.
> > > > > We spawn 5 consumers for each topic. I am observing that consider
> > group
> > > > > randomly keeps rebalancing. Then many times we see logs saying
> > > "Revoking
> > > > > partitions for". This happens almost every 10 minutes. Consumption
> > > during
> > > > > this time completely stops.
> > > > >
> > > > > I have applied this configuration
> > > > > max.poll.records 20
> > > > > heartbeat.interval.ms 10000
> > > > > Session.timeout.ms 6000
> > > > >
> > > > > Still this did not help. Strange thing is I observed consumer
> writing
> > > > logs
> > > > > saying "auto commit failed because poll() loop spent too much time
> > > > > processing records" even when there was no data in partition to
> > > process.
> > > > We
> > > > > have polling interval of 500 ms, specified as argument in poll().
> > > > Initially
> > > > > I had set same consumer group for all three topics' consumers.
> Then I
> > > > > specified different CGs for different topics' consumers. Even this
> is
> > > not
> > > > > helping.
> > > > >
> > > > > I am trying to search over the web, checked my code, tried many
> > > > > combinations of configuration but still no luck. Please help me.
> > > > >
> > > > > *Thanks & Regards,*
> > > > >
> > > > > *Shantanu Deshmukh*
> > > > >
> > > >
> > >
> >
>

Re: Frequent consumer rebalance, auto commit failures

Posted by "M. Manna" <ma...@gmail.com>.
Set your rebalance.backoff.ms=10000 and zookeeper.session.timeout.ms=30000
in addition to what Manikumar said.



On 24 May 2018 at 12:41, Shantanu Deshmukh <sh...@gmail.com> wrote:

> Hello,
>
> There was a type in my first mail. session.timeout.ms is actually 60000
> not
> 6000. So it is less than heartbeat.interval.ms.
>
> On Thu, May 24, 2018 at 2:46 PM Manikumar <ma...@gmail.com>
> wrote:
>
> > heartbeat.interval.ms should be lower than session.timeout.ms.
> >
> > Check here:
> > http://kafka.apache.org/0101/documentation.html#newconsumerconfigs
> >
> >
> > On Thu, May 24, 2018 at 2:39 PM, Shantanu Deshmukh <
> shantanu88d@gmail.com>
> > wrote:
> >
> > > Someone please help me. I am suffering due to this issue since a long
> > time
> > > and not finding any solution.
> > >
> > > On Wed, May 23, 2018 at 3:48 PM Shantanu Deshmukh <
> shantanu88d@gmail.com
> > >
> > > wrote:
> > >
> > > > We have a 3 broker Kafka 0.10.0.1 cluster. There we have 3 topics
> with
> > 10
> > > > partitions each. We have an application which spawns threads as
> > > consumers.
> > > > We spawn 5 consumers for each topic. I am observing that consider
> group
> > > > randomly keeps rebalancing. Then many times we see logs saying
> > "Revoking
> > > > partitions for". This happens almost every 10 minutes. Consumption
> > during
> > > > this time completely stops.
> > > >
> > > > I have applied this configuration
> > > > max.poll.records 20
> > > > heartbeat.interval.ms 10000
> > > > Session.timeout.ms 6000
> > > >
> > > > Still this did not help. Strange thing is I observed consumer writing
> > > logs
> > > > saying "auto commit failed because poll() loop spent too much time
> > > > processing records" even when there was no data in partition to
> > process.
> > > We
> > > > have polling interval of 500 ms, specified as argument in poll().
> > > Initially
> > > > I had set same consumer group for all three topics' consumers. Then I
> > > > specified different CGs for different topics' consumers. Even this is
> > not
> > > > helping.
> > > >
> > > > I am trying to search over the web, checked my code, tried many
> > > > combinations of configuration but still no luck. Please help me.
> > > >
> > > > *Thanks & Regards,*
> > > >
> > > > *Shantanu Deshmukh*
> > > >
> > >
> >
>

Re: Frequent consumer rebalance, auto commit failures

Posted by Shantanu Deshmukh <sh...@gmail.com>.
Hello,

There was a type in my first mail. session.timeout.ms is actually 60000 not
6000. So it is less than heartbeat.interval.ms.

On Thu, May 24, 2018 at 2:46 PM Manikumar <ma...@gmail.com> wrote:

> heartbeat.interval.ms should be lower than session.timeout.ms.
>
> Check here:
> http://kafka.apache.org/0101/documentation.html#newconsumerconfigs
>
>
> On Thu, May 24, 2018 at 2:39 PM, Shantanu Deshmukh <sh...@gmail.com>
> wrote:
>
> > Someone please help me. I am suffering due to this issue since a long
> time
> > and not finding any solution.
> >
> > On Wed, May 23, 2018 at 3:48 PM Shantanu Deshmukh <shantanu88d@gmail.com
> >
> > wrote:
> >
> > > We have a 3 broker Kafka 0.10.0.1 cluster. There we have 3 topics with
> 10
> > > partitions each. We have an application which spawns threads as
> > consumers.
> > > We spawn 5 consumers for each topic. I am observing that consider group
> > > randomly keeps rebalancing. Then many times we see logs saying
> "Revoking
> > > partitions for". This happens almost every 10 minutes. Consumption
> during
> > > this time completely stops.
> > >
> > > I have applied this configuration
> > > max.poll.records 20
> > > heartbeat.interval.ms 10000
> > > Session.timeout.ms 6000
> > >
> > > Still this did not help. Strange thing is I observed consumer writing
> > logs
> > > saying "auto commit failed because poll() loop spent too much time
> > > processing records" even when there was no data in partition to
> process.
> > We
> > > have polling interval of 500 ms, specified as argument in poll().
> > Initially
> > > I had set same consumer group for all three topics' consumers. Then I
> > > specified different CGs for different topics' consumers. Even this is
> not
> > > helping.
> > >
> > > I am trying to search over the web, checked my code, tried many
> > > combinations of configuration but still no luck. Please help me.
> > >
> > > *Thanks & Regards,*
> > >
> > > *Shantanu Deshmukh*
> > >
> >
>

Re: Frequent consumer rebalance, auto commit failures

Posted by Manikumar <ma...@gmail.com>.
heartbeat.interval.ms should be lower than session.timeout.ms.

Check here:
http://kafka.apache.org/0101/documentation.html#newconsumerconfigs


On Thu, May 24, 2018 at 2:39 PM, Shantanu Deshmukh <sh...@gmail.com>
wrote:

> Someone please help me. I am suffering due to this issue since a long time
> and not finding any solution.
>
> On Wed, May 23, 2018 at 3:48 PM Shantanu Deshmukh <sh...@gmail.com>
> wrote:
>
> > We have a 3 broker Kafka 0.10.0.1 cluster. There we have 3 topics with 10
> > partitions each. We have an application which spawns threads as
> consumers.
> > We spawn 5 consumers for each topic. I am observing that consider group
> > randomly keeps rebalancing. Then many times we see logs saying "Revoking
> > partitions for". This happens almost every 10 minutes. Consumption during
> > this time completely stops.
> >
> > I have applied this configuration
> > max.poll.records 20
> > heartbeat.interval.ms 10000
> > Session.timeout.ms 6000
> >
> > Still this did not help. Strange thing is I observed consumer writing
> logs
> > saying "auto commit failed because poll() loop spent too much time
> > processing records" even when there was no data in partition to process.
> We
> > have polling interval of 500 ms, specified as argument in poll().
> Initially
> > I had set same consumer group for all three topics' consumers. Then I
> > specified different CGs for different topics' consumers. Even this is not
> > helping.
> >
> > I am trying to search over the web, checked my code, tried many
> > combinations of configuration but still no luck. Please help me.
> >
> > *Thanks & Regards,*
> >
> > *Shantanu Deshmukh*
> >
>

Re: Frequent consumer rebalance, auto commit failures

Posted by Shantanu Deshmukh <sh...@gmail.com>.
Someone please help me. I am suffering due to this issue since a long time
and not finding any solution.

On Wed, May 23, 2018 at 3:48 PM Shantanu Deshmukh <sh...@gmail.com>
wrote:

> We have a 3 broker Kafka 0.10.0.1 cluster. There we have 3 topics with 10
> partitions each. We have an application which spawns threads as consumers.
> We spawn 5 consumers for each topic. I am observing that consider group
> randomly keeps rebalancing. Then many times we see logs saying "Revoking
> partitions for". This happens almost every 10 minutes. Consumption during
> this time completely stops.
>
> I have applied this configuration
> max.poll.records 20
> heartbeat.interval.ms 10000
> Session.timeout.ms 6000
>
> Still this did not help. Strange thing is I observed consumer writing logs
> saying "auto commit failed because poll() loop spent too much time
> processing records" even when there was no data in partition to process. We
> have polling interval of 500 ms, specified as argument in poll(). Initially
> I had set same consumer group for all three topics' consumers. Then I
> specified different CGs for different topics' consumers. Even this is not
> helping.
>
> I am trying to search over the web, checked my code, tried many
> combinations of configuration but still no luck. Please help me.
>
> *Thanks & Regards,*
>
> *Shantanu Deshmukh*
>