You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/03/29 17:50:34 UTC

[GitHub] [kafka] rhauch commented on pull request #8844: KAFKA-9887 fix failed task or connector count on startup failure

rhauch commented on pull request #8844:
URL: https://github.com/apache/kafka/pull/8844#issuecomment-809581891


   @C0urante said this:
   > I'm also on the fence about the reshuffling of where `recordConnectorStartupSuccess` and `recordTaskSuccess` are called. The descriptions of the connector/task startup metrics in [KIP-196](https://cwiki.apache.org/confluence/display/KAFKA/KIP-196%3A+Add+metrics+to+Kafka+Connect+framework) are a little thin; for example, the doc for `connector-startup-success-total` is "The total number of connector starts that succeeded.". A question that we'll want to answer before merging this is: if the framework successfully instantiates a connector and is able to call `start` on it, should that alone qualify as a "successful" startup, or does the call to `start` also have to go off without a hitch?
   
   Later, @michael-carter-instaclustr replied:
   > My fundamental assumption in approaching this was that the ‘connector-startup-failure-total’ metric, described in the KIP as ‘The total number of connector starts that failed’, was intended to be a numerical record of failures within the ‘start’ method of the connector (And likewise for the task based metrics). Or in other words, they represented the health of the worker in an integration sense. (e.g. Does the worker have the right connectivity to do its job? Are people submitting valid configurations or are the users of Connect not understanding how to use it?) This to me seems like a useful aggregate metric that relates to the use of the worker as a whole more than a record of any individual connector failure.
   
   Indeed the original intent for `connector-startup-success-total` metric was to represent the number of connectors whose `start()` method ran without throwing an exception. The `connector-startup-failure-total` metric OTOH was to represent the number of connectors whose `start()` method did throw an exception. 
   
   Likewise, for tasks and the corresponding task metrics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org