You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "whsoul (Jira)" <ji...@apache.org> on 2020/01/15 10:31:00 UTC

[jira] [Created] (KAFKA-9436) New Kafka Connect SMT for plainText => Struct(or Map)

whsoul created KAFKA-9436:
-----------------------------

             Summary: New Kafka Connect SMT for plainText => Struct(or Map)
                 Key: KAFKA-9436
                 URL: https://issues.apache.org/jira/browse/KAFKA-9436
             Project: Kafka
          Issue Type: Improvement
          Components: KafkaConnect
            Reporter: whsoul


I'd like to parse and convert plain text rows to struct(or map) data, and load into documented database such as mongoDB, elasticSearch, etc... with SMT

 

For example

 

plain text apache log
{code:java}
"111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS /api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36\""
{code}
SMT connect config with regular expression below can easily transform a plain text to struct (or map) data.

 
{code:java}
"transforms": "TimestampTopic, RegexTransform",
"transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",

"transforms.RegexTransform.struct.field": "message",
"transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) (.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""

"transforms.RegexTransform.mapping": "IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms:NUMBER,Referrer,UserAgent"
{code}
 

I have PR about this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)