You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Tom Bentley (Jira)" <ji...@apache.org> on 2020/01/07 08:57:00 UTC

[jira] [Commented] (KAFKA-9374) Worker can be disabled by blocked connectors

    [ https://issues.apache.org/jira/browse/KAFKA-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009522#comment-17009522 ] 

Tom Bentley commented on KAFKA-9374:
------------------------------------

"Abandoning" the thread isn't really a solution. You cannot be sure that thread will ever die. Abandon enough threads and it starts to look like a resource leak. Also, any connector which hangs in one of those methods is buggy and that bug will come to light (and potentially be fixed) sooner if the connector fails in a noticeable way which the end user is likely to notice. By papering over the cracks like this isn't the end user more likely to just try recreating the connector (which maybe works some of the time)? Thus potentially letting the bug live longer?

> Worker can be disabled by blocked connectors
> --------------------------------------------
>
>                 Key: KAFKA-9374
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9374
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>    Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 2.4.0, 2.3.1
>            Reporter: Chris Egerton
>            Assignee: Chris Egerton
>            Priority: Major
>
> If a connector hangs during any of its {{initialize}}, {{start}}, {{stop}}, \{taskConfigs}}, {{taskClass}}, {{version}}, {{config}}, or {{validate}} methods, the worker will be disabled for some types of requests thereafter, including connector creation, connector reconfiguration, and connector deletion.
>  This only occurs in distributed mode and is due to the threading model used by the [DistributedHerder|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java] class.
>  
> One potential solution could be to treat connectors that fail to start, stop, etc. in time similarly to tasks that fail to stop within the [task graceful shutdown timeout period|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConfig.java#L121-L126] by handling all connector interactions on a separate thread, waiting for them to complete within a timeout, and abandoning the thread (and transitioning the connector to the {{FAILED}} state, if it has been created at all) if that timeout expires.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)