You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Randall Hauch (JIRA)" <ji...@apache.org> on 2016/06/10 01:08:20 UTC

[jira] [Comment Edited] (KAFKA-3816) Provide more context in Kafka Connect log messages

    [ https://issues.apache.org/jira/browse/KAFKA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323684#comment-15323684 ] 

Randall Hauch edited comment on KAFKA-3816 at 6/10/16 1:07 AM:
---------------------------------------------------------------

{quote}
One thing we might want to do in conjunction with this is to allow the framework to allocate threads for connectors and set this up. Background threads that monitor sources are a particular case that we don't provide any help to connector developers for and would mean we might be missing this info in those threads.
{quote}

The simple approach is to have the connector code that creates threads set this up, but I agree it's fairly complex and will likely be ignored or not done. We can provide a factory method for threads, perhaps on the context, but we'd probably have to provide multiple overloaded methods to provide the flexibility they need. 

Perhaps a slightly easier way is to provide to the contexts a new interface called {{LoggingContext}} that has methods that connector and task implementations can call:

{code:java}
public interface LoggingContext {
     public void use(String contextName);
}
{code}

The reason I'd suggest an interface is just in case we want to provide additional helper methods in the future. For example, with Java 8 it's really easy to temporarily perform logic within a custom context: 

{code:java}
loggingContext.withContext("newContextName", ()->{ ... });
{code}

where that lambda might log messages, and after the lambda completes it automatically sets the MDC context back to what it was before this method is called. (I'm not saying we need this now, but I wouldn't be surprised if we want to add a few helper methods in the future.)

I'm also not a fan of hiding too much (e.g., if Kafka Connector provides subclasses of {{SourceTask}} that handle some of the threading automatically). In Debezium our {{SourceTask}} implementation actually doesn't create the threads; instead, it creates 1 or more "reader" components (depending upon what needs to be done) that are used sequentially, and each reader implementations (e.g., snapshot reader, binlog reader, etc.) create their own threads if needed or uses libraries that create threads. And while libraries rarely give you the control you need for thread creation, they often take callbacks or make other calls to methods that you provide, and its in those methods that you have a chance to set the MDC parameters.

{quote}
For the naming – the scope makes sense but we want a reliable way to extract this from the connector without it having to provide extra info if possible. Basing this on class name could work. My guess is that 3 parameters should be sufficient (and if not, we probably need to do some refactoring to simplify that code), but we'll just have to try it to find out.
{quote}

Yeah, I agree. Perhaps a non-abstract method (e.g., {{getConnectorTypeName()}}) on the Connector class that defaults to grabbing the class short name. Then, subclasses can always override this to provide something shorter and more meaningful.


was (Author: rhauch):
{quote}
One thing we might want to do in conjunction with this is to allow the framework to allocate threads for connectors and set this up. Background threads that monitor sources are a particular case that we don't provide any help to connector developers for and would mean we might be missing this info in those threads.
{quote}

The simple approach is to have the connector code that creates threads set this up, but I agree it's fairly complex and will likely be ignored or not done. We can provide a factory method for threads, perhaps on the context, but we'd probably have to provide multiple overloaded methods to provide the flexibility they need. 

Perhaps a slightly easier way is to provide to the contexts a new interface called {{LoggingContext}} that has methods that connector and task implementations can call:

{code:java}
public interface LoggingContext {
     public void use(String contextName);
}
{code:java}

The reason I'd suggest an interface is just in case we want to provide additional helper methods in the future. For example, with Java 8 it's really easy to temporarily perform logic within a custom context: 

{code:java}
loggingContext.withContext("newContextName", ()->{ ... });
{code}

where that lambda might log messages, and after the lambda completes it automatically sets the MDC context back to what it was before this method is called. (I'm not saying we need this now, but I wouldn't be surprised if we want to add a few helper methods in the future.)

I'm also not a fan of hiding too much (e.g., if Kafka Connector provides subclasses of {{SourceTask}} that handle some of the threading automatically). In Debezium our {{SourceTask}} implementation actually doesn't create the threads; instead, it creates 1 or more "reader" components (depending upon what needs to be done) that are used sequentially, and each reader implementations (e.g., snapshot reader, binlog reader, etc.) create their own threads if needed or uses libraries that create threads. And while libraries rarely give you the control you need for thread creation, they often take callbacks or make other calls to methods that you provide, and its in those methods that you have a chance to set the MDC parameters.

{quote}
For the naming – the scope makes sense but we want a reliable way to extract this from the connector without it having to provide extra info if possible. Basing this on class name could work. My guess is that 3 parameters should be sufficient (and if not, we probably need to do some refactoring to simplify that code), but we'll just have to try it to find out.
{quote}

Yeah, I agree. Perhaps a non-abstract method (e.g., {{getConnectorTypeName()}}) on the Connector class that defaults to grabbing the class short name. Then, subclasses can always override this to provide something shorter and more meaningful.

> Provide more context in Kafka Connect log messages
> --------------------------------------------------
>
>                 Key: KAFKA-3816
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3816
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 0.9.0.1
>            Reporter: Randall Hauch
>            Assignee: Ewen Cheslack-Postava
>
> Currently it is relatively difficult to correlate individual log messages with the various threads and activities that are going on within a Kafka Connect worker, let along a cluster of workers. Log messages should provide more context to make it easier and to allow log scraping tools to coalesce related log messages.
> One simple way to do this is by using _mapped diagnostic contexts_, or MDC. This is supported by the SLF4J API, and by the Logback and Log4J logging frameworks.
> Basically, the framework would be changed so that each thread is configured with one or more MDC parameters using the {{org.slf4j.MDC.put(String,String)}} method in SLF4J. Once that thread is configured, all log messages made using that thread have that context. The logs can then be configured to use those parameters.
> It would be ideal to define a convention for connectors and the Kafka Connect framework. A single set of MDC parameters means that the logging framework can use the specific parameters on its message formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)