You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Dibyendu Bhattacharya <di...@gmail.com> on 2020/08/09 13:54:26 UTC
Consumer Rebalance issue and follow-up on KAFKA-9752
Hi,
This is regards to https://issues.apache.org/jira/browse/KAFKA-9752 issue
where Consumer rebalance can be stuck after new member timeout with old
JoinGroup version.
We have taken the fix, but now see a different issue .
Earlier the ConsumerGroup was stuck in "PendingRebalance" state , which is
not happening now , but now I see members not able to join the group . I
see below logs where members are being removed after session timeout.
[2020-08-09 09:29:00,558] INFO [GroupCoordinator 5]: Pending member XXX in
group YYY has been removed after session timeout expiration.
(kafka.coordinator.group.GroupCoordinator)
[2020-08-09 09:29:55,856] INFO [GroupCoordinator 5]: Pending member ZZZ in
group YYY has been removed after session timeout expiration.
(kafka.coordinator.group.GroupCoordinator)
As I see the GroupCoridinator code, when new member tries to join for
first time, GroupCoridinator also schedule addPendingMemberExpiration (in
doUnknownJoinGroup call ) with *SessionTimeOut*…
If for some reason , addMemberAndRebalance call takes longer (longer than
SessionTimeOut), and members are still in “Pending” state, the above
addPendingMemberExpiration can remove the pending member and they cannot
join the group. I think that is what is happening.
When for new member , Coordinator is already setting a timeout
in completeAndScheduleNextExpiration(group, member, *NewMemberJoinTimeoutMs*
)
What is the requirement for one more addPendingMemberExpiration task for
new members ?
Is this a possible bug ?