You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Tom Bentley (Jira)" <ji...@apache.org> on 2020/06/26 16:13:00 UTC

[jira] [Commented] (KAFKA-10206) Admin can transiently return incorrect results about topics

    [ https://issues.apache.org/jira/browse/KAFKA-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17146444#comment-17146444 ] 

Tom Bentley commented on KAFKA-10206:
-------------------------------------

I think the broker needs some alternative to replying with invalid data until it has received topic data from the controller.
There are no meaningful retriable errors which the broker could return (and older clients would not expect these).
The broker could delay responding to a metadata request until it had received topic data from the controller. That's complicated by the fact that the initial UPDATE_METADATA request from the controller lacks topic data. While using a counter would work most of the time, it is not safe if the controller didn't sent the 2nd UPDATE_METADATA request (e.g. due to controller failover). An alternative to using a counter would be to distinguish in the UPDATE_METADATA request between an empty topic list and a null topic list.

Then there's the question of how long the broker should wait before responding. The alternative to waiting would be to return some new retriable error code to the client (which could then try another broker).

[~ijuma], [~cmccabe] do you have any better ideas about how best to address this?

> Admin can transiently return incorrect results about topics
> -----------------------------------------------------------
>
>                 Key: KAFKA-10206
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10206
>             Project: Kafka
>          Issue Type: Bug
>          Components: admin, core
>            Reporter: Tom Bentley
>            Assignee: Tom Bentley
>            Priority: Major
>
> When a broker starts up it can handle metadata requests before it has 
> received UPDATE_METADATA requests from the controller. 
> This manifests in the admin client via:
> * listTopics returning an empty list
> * describeTopics and describeConfigs of topics erroneously returning TopicOrPartitionNotFoundException
> I assume this also affects the producer and consumer, though since `UnknownTopicOrPartitionException` is retriable those clients recover.
> Testing locally suggests that the window for this happening is typically <1s.
> There doesn't seem to be any way for the caller of the Admin client to detect this situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)