You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "whsoul (Jira)" <ji...@apache.org> on 2020/01/15 10:46:00 UTC
[jira] [Commented] (KAFKA-9436) New Kafka Connect SMT for plainText
=> Struct(or Map)
[ https://issues.apache.org/jira/browse/KAFKA-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015833#comment-17015833 ]
whsoul commented on KAFKA-9436:
-------------------------------
[https://github.com/apache/kafka/pull/7965]
> New Kafka Connect SMT for plainText => Struct(or Map)
> -----------------------------------------------------
>
> Key: KAFKA-9436
> URL: https://issues.apache.org/jira/browse/KAFKA-9436
> Project: Kafka
> Issue Type: Improvement
> Components: KafkaConnect
> Reporter: whsoul
> Priority: Major
>
> I'd like to parse and convert plain text rows to struct(or map) data, and load into documented database such as mongoDB, elasticSearch, etc... with SMT
>
> For example
>
> plain text apache log
> {code:java}
> "111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS /api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36\""
> {code}
> SMT connect config with regular expression below can easily transform a plain text to struct (or map) data.
>
> {code:java}
> "transforms": "TimestampTopic, RegexTransform",
> "transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
> "transforms.RegexTransform.struct.field": "message",
> "transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) (.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""
> "transforms.RegexTransform.mapping": "IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms:NUMBER,Referrer,UserAgent"
> {code}
>
> I have PR about this
--
This message was sent by Atlassian Jira
(v8.3.4#803005)