You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Randall Hauch (Jira)" <ji...@apache.org> on 2020/06/11 15:26:00 UTC
[jira] [Resolved] (KAFKA-9374) Worker can be disabled by blocked
connectors
[ https://issues.apache.org/jira/browse/KAFKA-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Randall Hauch resolved KAFKA-9374.
----------------------------------
Reviewer: Konstantine Karantasis
Resolution: Fixed
Merged to `trunk` and backported to the `2.6` branch for inclusion in 2.6.0.
> Worker can be disabled by blocked connectors
> --------------------------------------------
>
> Key: KAFKA-9374
> URL: https://issues.apache.org/jira/browse/KAFKA-9374
> Project: Kafka
> Issue Type: Bug
> Components: KafkaConnect
> Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 2.4.0, 2.3.1
> Reporter: Chris Egerton
> Assignee: Chris Egerton
> Priority: Major
> Fix For: 2.6.0
>
>
> If a connector hangs during any of its {{initialize}}, {{start}}, {{stop}}, \{taskConfigs}}, {{taskClass}}, {{version}}, {{config}}, or {{validate}} methods, the worker will be disabled for some types of requests thereafter, including connector creation, connector reconfiguration, and connector deletion.
> -This only occurs in distributed mode and is due to the threading model used by the [DistributedHerder|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java] class.- This affects both distributed and standalone mode. Distributed herders perform some connector work synchronously in their {{tick}} thread, which also handles group membership and some REST requests. The majority of the herder methods for the standalone herder are {{synchronized}}, including those for creating, updating, and deleting connectors; as long as one of those methods blocks, all subsequent calls to any of these methods will also be blocked.
>
> One potential solution could be to treat connectors that fail to start, stop, etc. in time similarly to tasks that fail to stop within the [task graceful shutdown timeout period|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConfig.java#L121-L126] by handling all connector interactions on a separate thread, waiting for them to complete within a timeout, and abandoning the thread (and transitioning the connector to the {{FAILED}} state, if it has been created at all) if that timeout expires.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)