You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Jin, Yun" <yu...@sap.com> on 2020/08/12 09:17:55 UTC

will partition reassignment cause same message being processed in parallel?

Hi,

Partition may be reassigned if the consumer is not reachable. Will partition reassignment cause same message being processed in parallel?
Suppose if Kafka found consumer A is not reachable (maybe because of network problem), it assigns the partition to consumer B. But actually consumer A is still running, this may cause the situation that a same message is processed in parallel by consumer A and consumer B. If the logic in processing the message includes updating some database tables, we have to lock some rows first. If such parallel processing the same message can not happen, we needn't use lock.
Appreciate if you can provide some information on this.

Best regards,
Yun

RE: will partition reassignment cause same message being processed in parallel?

Posted by "Jin, Yun" <yu...@sap.com>.
Thanks. This is a good example to trigger rebalance. We may set max.poll.records to a small number and set max.poll.interval.ms to a large number to avoid this situation. In some cases the heartbeat simply cannot reach Kakfa (maybe because of network problem), which can also trigger rebalance. In this case, the same message may also be processed in parallel.

-----Original Message-----
From: Yingshuan Song <so...@gmail.com> 
Sent: Thursday, August 13, 2020 23:42
To: users@kafka.apache.org
Subject: Re: will partition reassignment cause same message being processed in parallel?

Yes,it is possible.

Think of this :
  1 - Consumer A and consumer B ( A and B with the same consumer group.id
and enable auto commit) consume a topic T with 2 partitions,and the
assignment is (A  => T-0 , B => T-1)
  2 - The application runs correctly when the records are sent to kafka at
a low speed. let's assume :
            2-1) each consumer can process 5 records per second
            2-2) the speed of records sent to topic T is 5/partition/second
           => so that each time the consumer can poll 5 records and process
correctly in one second.
  3 - The speed of records sent to partiton T-0 increases to 500/second,so
consumer A can poll 500 records next time and 500/5 = 100s needed to
process.
  4 - If the config - 'max.poll.interval.ms' of consumer is less than
100s,kafka broker will think consumer A is dead 100 seconds later even if A
is working hard and a rebalance will be invoked.
  5 - Consumer B will take over both the two partitions in the next round
of balance and reprocess the records in T-0.
  6 - Step 4 && 5 will keep going and never recover automatically.

Hope this helps

Jin, Yun <yu...@sap.com> 于2020年8月12日周三 下午5:18写道:

> Hi,
>
> Partition may be reassigned if the consumer is not reachable. Will
> partition reassignment cause same message being processed in parallel?
> Suppose if Kafka found consumer A is not reachable (maybe because of
> network problem), it assigns the partition to consumer B. But actually
> consumer A is still running, this may cause the situation that a same
> message is processed in parallel by consumer A and consumer B. If the logic
> in processing the message includes updating some database tables, we have
> to lock some rows first. If such parallel processing the same message can
> not happen, we needn't use lock.
> Appreciate if you can provide some information on this.
>
> Best regards,
> Yun
>

Re: will partition reassignment cause same message being processed in parallel?

Posted by Yingshuan Song <so...@gmail.com>.
Yes,it is possible.

Think of this :
  1 - Consumer A and consumer B ( A and B with the same consumer group.id
and enable auto commit) consume a topic T with 2 partitions,and the
assignment is (A  => T-0 , B => T-1)
  2 - The application runs correctly when the records are sent to kafka at
a low speed. let's assume :
            2-1) each consumer can process 5 records per second
            2-2) the speed of records sent to topic T is 5/partition/second
           => so that each time the consumer can poll 5 records and process
correctly in one second.
  3 - The speed of records sent to partiton T-0 increases to 500/second,so
consumer A can poll 500 records next time and 500/5 = 100s needed to
process.
  4 - If the config - 'max.poll.interval.ms' of consumer is less than
100s,kafka broker will think consumer A is dead 100 seconds later even if A
is working hard and a rebalance will be invoked.
  5 - Consumer B will take over both the two partitions in the next round
of balance and reprocess the records in T-0.
  6 - Step 4 && 5 will keep going and never recover automatically.

Hope this helps

Jin, Yun <yu...@sap.com> 于2020年8月12日周三 下午5:18写道:

> Hi,
>
> Partition may be reassigned if the consumer is not reachable. Will
> partition reassignment cause same message being processed in parallel?
> Suppose if Kafka found consumer A is not reachable (maybe because of
> network problem), it assigns the partition to consumer B. But actually
> consumer A is still running, this may cause the situation that a same
> message is processed in parallel by consumer A and consumer B. If the logic
> in processing the message includes updating some database tables, we have
> to lock some rows first. If such parallel processing the same message can
> not happen, we needn't use lock.
> Appreciate if you can provide some information on this.
>
> Best regards,
> Yun
>