You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "whsoul (Jira)" <ji...@apache.org> on 2020/01/15 10:31:00 UTC
[jira] [Created] (KAFKA-9436) New Kafka Connect SMT for plainText
=> Struct(or Map)
whsoul created KAFKA-9436:
-----------------------------
Summary: New Kafka Connect SMT for plainText => Struct(or Map)
Key: KAFKA-9436
URL: https://issues.apache.org/jira/browse/KAFKA-9436
Project: Kafka
Issue Type: Improvement
Components: KafkaConnect
Reporter: whsoul
I'd like to parse and convert plain text rows to struct(or map) data, and load into documented database such as mongoDB, elasticSearch, etc... with SMT
For example
plain text apache log
{code:java}
"111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS /api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36\""
{code}
SMT connect config with regular expression below can easily transform a plain text to struct (or map) data.
{code:java}
"transforms": "TimestampTopic, RegexTransform",
"transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
"transforms.RegexTransform.struct.field": "message",
"transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) (.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""
"transforms.RegexTransform.mapping": "IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms:NUMBER,Referrer,UserAgent"
{code}
I have PR about this
--
This message was sent by Atlassian Jira
(v8.3.4#803005)