You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Ravindranath Kakarla (Jira)" <ji...@apache.org> on 2023/04/29 01:29:00 UTC

[jira] [Created] (KAFKA-14952) Publish metrics when source connector fails to poll data

Ravindranath Kakarla created KAFKA-14952:
--------------------------------------------

             Summary: Publish metrics when source connector fails to poll data
                 Key: KAFKA-14952
                 URL: https://issues.apache.org/jira/browse/KAFKA-14952
             Project: Kafka
          Issue Type: Improvement
          Components: KafkaConnect
    Affects Versions: 3.3.2
            Reporter: Ravindranath Kakarla


Currently, there is no metric in Kafka Connect to track when a source connector fails to poll data from the source. This information would be useful to operators and developers to visualize, monitor and alert when the connector fails to poll records from the source.

Existing metrics like `kafka_producer_producer_metrics_record_error_total` and 

`kafka_connect_task_error_metrics_total_record_failures` only cover failures when producing data to the Kafka cluster but not when the source task fails with a retryable exception or ConnectException.

Polling from source can fail due to unavailability of the source system or errors with the connect configuration. Currently, this cannot be monitored directly using metrics and instead operators have to rely on log diving which is not consistent with how other metrics are monitored.

I propose adding new metrics to Kafka Connect, "source-record-poll-error-total" and "source-record-poll-error-rate" that can be used to monitor failures during polling.

`source-record-poll-error-total` - The total number of times a source connector failed to poll data from the source. This will include both retryable and non-retryable exceptions.

`source-record-poll-error-rate` - The rate of above failures per unit of time. 

These metrics would be tracked at the connector level and could be exposed through the JMX along with the other metrics.

I am willing to submit a PR if this looks good, sample implementation code below,

 
{code:java}
//AbstractWorkerSourceTask.java

protected List<SourceRecord> poll() throws InterruptedException {
    try {
        return task.poll();
    } catch (RetriableException | org.apache.kafka.common.errors.RetriableException e) {
        log.warn("{} failed to poll records from SourceTask. Will retry operation.", this, e);
      
         sourceTaskMetricsGroup.recordPollError();

        // Do nothing. Let the framework poll whenever it's ready.
        return null;
    } catch (Throwable e) {
        sourceTaskMetricsGroup.recordPollError();
        
        throw e;
    }
} {code}
 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)