You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/04/02 01:26:00 UTC

[jira] [Commented] (KAFKA-9801) Static member could get empty assignment unexpectedly

    [ https://issues.apache.org/jira/browse/KAFKA-9801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073291#comment-17073291 ] 

ASF GitHub Bot commented on KAFKA-9801:
---------------------------------------

guozhangwang commented on pull request #8405: KAFKA-9801: Still trigger rebalance when static member joins in CompletingRebalance phase [WIP]
URL: https://github.com/apache/kafka/pull/8405
 
 
   1. Fix the direct cause of the observed issue on the client side: when heartbeat getting errors and resetting generation, we only need to set it to UNJOINED when it was not already in REBALANCING; otherwise, the join-group handler would throw the retriable UnjoinedGroupException to force the consumer to re-send join group unnecessarily.
   
   2. Fix the root cause of the issue on the broker side: we should still trigger rebalance when static member joins in CompletingRebalance phase; otherwise the member.ids would be changed when the assignment is received from the leader, hence causing the new member.id's assignment to be empty.
   
   3. Added log4j entries as a by-product of my investigation.
   
   Testing coverage still in progress.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Static member could get empty assignment unexpectedly
> -----------------------------------------------------
>
>                 Key: KAFKA-9801
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9801
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer, streams
>    Affects Versions: 2.4.0
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>            Priority: Major
>
> Take the following example trace where static members are joining the group:
> 1. Static member with instance A joined the group with empty member, the coordinator generated member.id 1 for A and added it to the group. The group state is PreparingRebalance.
> 2. The group is formed and now we move on to CompletingRebalance.
> 3. Another member joins the group, causing it to transit back to PreparingRebalance, which would potentially send a REBALANCE_IN_PROGRESS to member A as well.
> 4. Member A gets the REBALANCE_IN_PROGRESS error, trying to re-join (again with an empty member.id)
> 5. The group is now advanced to CompletingRebalance again.
> 6. The group get the second join-group from the known instance A with an empty member.id, will generated a new member.id 2 and replace the member.id 1.
> 7. The group gets the assignment from leader which only includes member.id 1 and not member.id 2.
> 8. The assignment for member.id 1 is dropped on the broker side while the assignment for member.id 2 is set to an empty byte array.
> 9. The empty byte array is sent back to the instance A causing it the following error:
> {code}
> [2020-03-27T21:13:01-05:00] (streams-soak-2-5_soak_i-054b83e98b7ed6285_streamslog) org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'version': java.nio.BufferUnderflowException
> 	at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:110)
> {code}
> This error has to be triggered when quite a few cases are aligned together, and hence it was not triggered very frequently.
> Personally I think there's a correlation with this error to the observed https://issues.apache.org/jira/browse/KAFKA-9659 as well, which I'd keep investigating (will update in this ticket).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)