You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "A. Sophie Blee-Goldman (Jira)" <ji...@apache.org> on 2020/11/13 02:46:00 UTC
[jira] [Commented] (KAFKA-10716) Streams processId is unstable across restarts resulting in task mass migration

    [ https://issues.apache.org/jira/browse/KAFKA-10716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231116#comment-17231116 ] 

A. Sophie Blee-Goldman commented on KAFKA-10716:
------------------------------------------------

There are a few possible ways forward here:

1) generate the processId from the client.id config, if specified. This requires users to set this config and ensure that it's unique to the instance
2) generate the processId from the group.instance.id, if specified. This would only work for static membership users
3) write/load the processId from the checkpoint file in task directories
4) write/load the processId from a single file in the top-level application directory

Both 1 & 2 would be simple for us to implement, but somewhat obnoxious to require of a user just for basic functionality of their app. That said, if a user already has specified either the client.id or group.instance.id, I don't see any reason _not_ to generate the processId from that. This might be a good stop-gap measure, but not a good permanent solution. However if we plan to implement KAFKA-10121 right away then maybe it's best not to mess around with options 3 or 4

Options 3 and 4 would be a bit trickier. Option 3 in particular seems to open up a lot of nasty possibilities, like the processId differing from one task directory to another, or even between threads in the same app. But Option 4 seems pretty clean: we load the processId file within the KafkaStreams constructor, and if it's not found we generate a random UUID like we do now. This would all happen before any threads are created so no need to worry about them synchronizing at all

> Streams processId is unstable across restarts resulting in task mass migration
> ------------------------------------------------------------------------------
>
>                 Key: KAFKA-10716
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10716
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.6.0
>            Reporter: A. Sophie Blee-Goldman
>            Priority: Major
>
> The new high availability feature of KIP-441 relies on deterministic assignment to produce an eventually-stable assignment. The HighAvailabilityTaskAssignor assigns tasks based on the unique processId assigned to each client, so if the same set of Kafka Streams applications participate in a rebalance it should generate the same task assignment every time.
> Unfortunately the processIds aren't stable across restarts. We generate a random UUID in the KafkaStreams constructor, so each time the process starts up it would be assigned a completely different processId. Unless this new processId happens to be in exactly the same order as the previous one, a single bounce or crash/restart can result in a large scale shuffling of tasks based on a completely different eventual assignment.
> Ultimately we should fix this via KAFKA-10121, but that's a nontrivial undertaking and this bug merits some immediate relief if we don't intend to tackle the larger problem in the upcoming releases 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)