You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Dibyendu Bhattacharya <di...@gmail.com> on 2020/08/09 13:54:26 UTC

Consumer Rebalance issue and follow-up on KAFKA-9752

Hi,

This is regards to https://issues.apache.org/jira/browse/KAFKA-9752 issue
where Consumer rebalance can be stuck after new member timeout with old
JoinGroup version.

We have taken the fix, but now see a different issue .

Earlier the ConsumerGroup was stuck in "PendingRebalance" state , which is
not happening now , but now I see members not able to join the group . I
see below logs where members are being removed after session timeout.



[2020-08-09 09:29:00,558] INFO [GroupCoordinator 5]: Pending member XXX in
group YYY  has been removed after session timeout expiration.
(kafka.coordinator.group.GroupCoordinator)

[2020-08-09 09:29:55,856] INFO [GroupCoordinator 5]: Pending member ZZZ in
group YYY  has been removed after session timeout expiration.
(kafka.coordinator.group.GroupCoordinator)



As I see the GroupCoridinator code,  when new member tries to join for
first time,  GroupCoridinator also schedule addPendingMemberExpiration (in
doUnknownJoinGroup call ) with *SessionTimeOut*…



If for some reason , addMemberAndRebalance call takes longer (longer than
SessionTimeOut), and members are still in “Pending” state, the above
addPendingMemberExpiration can remove the pending member and they cannot
join the group. I think that is what is happening.



When for new member , Coordinator is already setting a timeout
in completeAndScheduleNextExpiration(group, member, *NewMemberJoinTimeoutMs*
)



What is the requirement for one more addPendingMemberExpiration task for
new members ?


Is this a possible bug ?