You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Guozhang Wang (Jira)" <ji...@apache.org> on 2021/02/24 04:12:00 UTC
[jira] [Commented] (KAFKA-12370) Refactor KafkaStreams exposed metadata hierarchy

    [ https://issues.apache.org/jira/browse/KAFKA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289642#comment-17289642 ] 

Guozhang Wang commented on KAFKA-12370:
---------------------------------------

Note this would be related to https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148648762 for augmented topology description.

> Refactor KafkaStreams exposed metadata hierarchy
> ------------------------------------------------
>
>                 Key: KAFKA-12370
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12370
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Priority: Major
>
> Currently in KafkaStreams we have two groups of metadata getter:
> 1.
> {code}
> allMetadata
> allMetadataForStore
> {code}
> Return collection of {{StreamsMetadata}}, which only contains the partitions as active/standby, plus the hostInfo, but not exposing any task info.
> 2.
> {code}
> queryMetadataForKey
> {code}
> Returns {{KeyQueryMetadata}} that includes the hostInfos of active and standbys, plus the partition id.
> 3.
> {code}
> localThreadsMetadata
> {code}
> Returns {{ThreadMetadata}}, that includes a collection of {{TaskMetadata}} for active and standby tasks.
> All the above functions are used for interactive queries, but their exposed metadata are very different, and some use cases would need to have all client, thread, and task metadata to fulfill the feature development. At the same time, we may have a more dynamic "task -> thread" mapping in the future and also the embedded clients like consumers would not be per thread, but per client.
> ---------------
> Rethinking about the metadata, I feel we can have a more consistent hierarchy as the following:
> * {{StreamsMetadata}} represent the metadata for the client, which includes the set of {{ThreadMetadata}} for its existing thread and the set of {{TaskMetadata}} for active and standby tasks assigned to this client, plus client metadata including hostInfo, embedded client ids.
> * {{ThreadMetadata}} includes name, state, the set of {{TaskMetadata}} for currently assigned tasks.
> * {{TaskMetadata}} includes the name (including the sub-topology id and the partition id), the state, the corresponding sub-topology description (including the state store names, source topic names).
> * {{allMetadata}}, {{allMetadataForStore}}, {{allMetadataForKey}} (renamed from queryMetadataForKey) returns the set of {{StreamsMetadata}}, and {{localMetadata}} (renamed from localThreadMetadata) returns a single {{StreamsMetadata}}.
> To illustrate as an example, to find out who are the current active host / standby hosts of a specific store, we would call {{allMetadataForStore}}, and for each returned {{StreamsMetadata}} we loop over their contained {{TaskMetadata}} for active / standby, and filter by its corresponding sub-topology's description's contained store name. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)