You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "A. Sophie Blee-Goldman (Jira)" <ji...@apache.org> on 2022/11/07 08:03:00 UTC

[jira] [Commented] (KAFKA-13887) Running multiple instance of same stateful KafkaStreams application on single host raise Exception

    [ https://issues.apache.org/jira/browse/KAFKA-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629647#comment-17629647 ] 

A. Sophie Blee-Goldman commented on KAFKA-13887:
------------------------------------------------

[~cadonna]  think we can we close this ticket as "Not a Problem" since the exception here is very much known/intentional?

> Running multiple instance of same stateful KafkaStreams application on single host raise Exception
> --------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-13887
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13887
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 2.6.0
>            Reporter: Sina Askarnejad
>            Priority: Minor
>
> KAFKA-10716 locks the state store directory on the running host, as it stores the processId in a *kafka-streams-process-metadata* file in this path. As a result to run multiple instances of the same application on a single host each instance must run with different *state.dir* config, otherwise the following exception will be raised for the second instance:
>  
> Exception in thread "main" org.apache.kafka.streams.errors.StreamsException: Unable to initialize state, this can happen if multiple instances of Kafka Streams are running in the same state directory
> at org.apache.kafka.streams.processor.internals.StateDirectory.initializeProcessId(StateDirectory.java:191)
> at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:868)
> at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:851)
> at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:821)
> at org.apache.kafka.streams.KafkaStreams.<init>(KafkaStreams.java:733)
>  
> The easiest solution multi-threading. Running single instance with multiple threads, but the multi-threading programming is not suitable for all scenarios. e.g., when the tasks are CPU intensive, or in large scale scenarios, or fully utilizing multi core CPUS.
>  
> The second solution is multi-processing. This solution on a single host needs extra work and advisor, as each instance needs to be run with different {*}state.dir{*}. It is a good enhancement if kafkaStreams could handle this config for multi instance.
>  
> The proposed solution is that the KafkaStreams use the */\{state.dir}/\{application.id}/\{ordinal.number}* path instead of */\{state.dir}/\{application.id}* to store the meta file and states. The *ordinal.number* starts with 0 and is incremental.
> When an instance starts it checks the ordinal.number directories start by 0 and finds the first subdirectory that is not locked and use that for its state directory, this way all the tasks assigns correctly on rebalance and multiple instance can be run on single host.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)