You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Joel Koshy (JIRA)" <ji...@apache.org> on 2016/11/01 06:24:58 UTC

[jira] [Commented] (KAFKA-4362) Offset commits fail after a partition reassignment

    [ https://issues.apache.org/jira/browse/KAFKA-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624520#comment-15624520 ] 

Joel Koshy commented on KAFKA-4362:
-----------------------------------

Btw, the summary doesn't make it clear that this also affects operations such as sync-group/join-group in the new consumer as well.
I glanced through the new consumer code's handling on unknown error. Specifically we will need to rediscover the coordinator to recover from this. It does not appear to do this, but will double-check tomorrow.

> Offset commits fail after a partition reassignment
> --------------------------------------------------
>
>                 Key: KAFKA-4362
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4362
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.1.0
>            Reporter: Joel Koshy
>            Assignee: Jiangjie Qin
>
> When a consumer offsets topic partition reassignment completes, an offset commit shows this:
> {code}
> java.lang.IllegalArgumentException: Message format version for partition 100 not found
>     at kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633) ~[kafka_2.10.jar:?]
>     at kafka.coordinator.GroupMetadataManager$$anonfun$14.apply(GroupMetadataManager.scala:633) ~[kafka_2.10.jar:?]
>     at scala.Option.getOrElse(Option.scala:120) ~[scala-library-2.10.4.jar:?]
>     at kafka.coordinator.GroupMetadataManager.kafka$coordinator$GroupMetadataManager$$getMessageFormatVersionAndTimestamp(GroupMetadataManager.scala:632) ~[kafka_2.10.jar:?]
>     at 
> ...
> {code}
> The issue is that the replica has been deleted so the {{GroupMetadataManager.getMessageFormatVersionAndTimestamp}} throws this exception instead which propagates as an unknown error.
> Unfortunately consumers don't respond to this and will fail their offset commits.
> One workaround in the above situation is to bounce the cluster - the consumer will be forced to rediscover the group coordinator.
> (Incidentally, the message incorrectly prints the number of partitions instead of the actual partition.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)